CellTracksColab — A platform for compiling, analyzing, and exploring tracking data

In life sciences, tracking objects from movies enables researchers to quantify the behavior of single particles, organelles, bacteria, cells, and even whole animals. While numerous tools now allow automated tracking from video, a significant challenge persists in compiling, analyzing, and exploring the large datasets generated by these approaches. Here, we introduce CellTracksColab, a platform tailored to simplify the exploration and analysis of tracking data. CellTracksColab facilitates the compiling and analysis of results across multiple fields of view, conditions, and repeats, ensuring a holistic dataset overview. CellTracksColab also harnesses the power of high-dimensional data reduction and clustering, enabling researchers to identify distinct behavioral patterns and trends without bias. Finally, CellTracksColab also includes specialized analysis modules enabling spatial analyses (clustering, proximity to specific regions of interest). We demonstrate CellTracksColab capabilities with three use cases, including T-cells and cancer cell migration, as well as filopodia dynamics. CellTracksColab is available for the broader scientific community at https://github.com/CellMigrationLab/CellTracksColab.


Introduction
In life science, tracking has emerged as an indispensable tool for unparalleled insights into dynamic molecular and cellular behaviors.Parallel to this, segmentation methods relying on machine learning and deep learning are now greatly facilitating the implementation of complex tracking pipelines [1][2][3][4] ; enabling the quantitative analysis of these dynamic behaviors.Yet, as the capabilities of tracking tools have expanded, so too have the challenges associated with analyzing the resulting data.
Multiple tools have been developed to help researchers compile tracking data; for instance, these include the Ibidi Chemotaxis tool (a Fiji plugin), the MotilityLab website (an online platform for CelltrackR 5 ), or TrackMateR 6 .These traditional analytical approaches, implemented by us and many others, typically reduce tracking datasets to population-level analyses where track metrics are averaged across different conditions.Yet, while practical, such analyses overlook the heterogeneity within biological data.Over the past two years, multiple tools, including CellPhe (an R toolbox 7 ), Traject3D (a collection of MATLAB scripts 8 ), and CellPlato (a Python toolbox 9 ), have been designed to harness the high-dimensionality of tracking datasets to assist in the unbiased discovery of rare phenotypes.Still, these tools often remain difficult to implement for users with no or little coding expertise.
Here, we present CellTracksColab, a Python-based platform to streamline the analysis of tracking datasets.This platform is specifically designed for researchers, particularly those with limited programming expertise, facilitating the exploration and analysis of tracking data.CellTracksColab leverages the power of Jupyter notebooks, which blend live code execution with comprehensive documentation access.CellTracksColab can run locally and in the cloud, accommodating diverse user preferences and resource availability.Drawing on successful models like ColabFold 10 and ZeroCostDL4Mic 11 , CellTracksColab is fully integrated within the Google Colaboratory framework (Colab).Through a simplified workflow, researchers can install essential software dependencies with a few mouse clicks, upload their tracking data, and run their analyses.CellTracksColab extends beyond visualization and population analyses, empowering researchers to delve into the nuanced dynamics and behaviors encapsulated within their tracking experiments.We first describe CellTracksColab's architecture.Then, we demonstrate CellTracksColab features and capabilities in studying T-cells and cancer cell migration, and filopodia dynamics.

Results
The CellTracksColab framework.
The CellTracksColab platform comprises a collection of Jupyter notebooks designed to streamline tracking data analysis (Fig. 1A).CellTracksColab can be run locally or in cloud services such as Google Colab, which provides users free access to computing resources that simplify the user experience by eliminating the need for local installations.
CellTracksColab is designed to process tracking data from various open-source tracking software, including TrackMate 1 , CellProfiler 12 , Icy 13 , ilastik 14 , and the Fiji Manual Tracker 15 .CellTracksColab supports tracking data stored in XML (TrackMate) and CSV formats (TrackMate, CellProfiler, Icy, ilastik, and Fiji Manual Tracker).CellTracksColab can also be made compatible with other tracking tools that export results that follow our minimal requirements (see documentation for details).To facilitate a structured analysis, users are advised to organize their files into directories representing different experimental conditions and biological repeats.This organizational strategy is crucial for accurately categorizing and analyzing the dataset, considering various aspects such as experimental conditions, biological replicates, and fields of view.By promoting structured data management, CellTracksColab streamlines the analytical process and enhances the exploration and understanding of data variability and heterogeneity across the dataset.
The performance of CellTracksColab is limited by the resources available to the user, particularly the amount of RAM available, which can limit the volume of data that can be processed.However, optimization of the underlying code has been executed to ensure maximal efficiency in resource utilization.For instance, we analyzed more than 50,000 tracks (> 3 million objects from 117 videos) using CellTracksColab and the free version of Google Colab (all results presented in the manuscript can be replicated with the free version of Google Colab).CellTracksColab could accommodate one of our larger datasets encompassing over 536,000 tracks (> 56 million objects from 300 videos), but this required the additional RAM that Google Colab Plus provides.(F) Measurement and analysis of object-to-region proximity using CellTracksColab.This example demonstrates the platform's utility in quantifying the distance of objects (marked as yellow dots) relative to a defined region of interest (denoted by the white edge).The tool allows tracking these distances over time and computing related metrics, facilitating in-depth spatial analysis.

Analyzing data using CellTracksColab
When the tracking data is loaded into CellTracksColab, it is automatically exported into the CellTracksColab format.This standardized format ensures consistent data access and manipulation, facilitating thorough analysis of the tracking data across the platform.Once exported, users can, for example, visualize (Fig. 1B), filter, and smooth tracks (Fig. S1).Track smoothing using moving averages can prove to be particularly beneficial before the computation of directionality metrics, especially when the tracked object exhibits jitteriness (e.g., nuclei) and the user's interest lies in discerning the overall movement of the cell.
Upon loading data, CellTracksColab can compute various track metrics or import them directly from prior analyses conducted in the tracking software (Fig. 1A).This is ensured by the flexible design of the CellTracksColab format, which provides the aggregation of additional metrics without affecting the content of the original dataset.Users can then generate box-plots illustrating the distribution of different track metrics of interest (Fig. 1C).Additionally, several relevant statistical metrics are calculated, such as Cohen's d value -which quantifies the standardized effect size between groups and is less sensitive to sample size variations-and the p-values of statistical hypothesis tests -which compare the distribution of track metrics across conditions.The statistical tests available include a randomization test that assesses the distribution of Cohen's d values obtained with bootstrapping and t-tests that compare the mean value distributions obtained from bootstrapping, following the SuperPlots methodology 16 .Both tests are available with and without Bonferroni Correction, which adjusts the p-value to account for multiple comparisons (Fig. 1C).CellTracksColab also enables users to perform quality control on their dataset, such as checking that their data is balanced between repeats and conditions.Namely, the user can resample unbalanced data before plotting the track metrics of interest.In addition, CellTracksColab can compute similarities across different experimental conditions and replicates using various track metrics to ensure data reliability and meaningful analysis.The results are visualized using hierarchical clustering in the form of dendrograms, which aids in comparing similarities within and across different conditions and identifying outliers.
Inspired by CellPlato 9 , CellTracksColab integrates Uniform Manifold Approximation and Projection (UMAP) or t-distributed Stochastic Neighbor Embedding (t-SNE) combined with Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) to explore the inherent heterogeneity within tracking datasets unbiasedly (Fig. 1D).This combination allows for dimensionality reduction and effective clustering of tracks.The platform provides capabilities to plot track metrics for each cluster, creates heatmaps for an overview of data variability, and identifies exemplar tracks, representatives of each cluster, for detailed analysis.CellTracksColab also includes specialized spatial analysis modules that enable the spatial analysis of track data.These modules enable, for instance, the assessment of track clustering (Fig. 1E) or calculating the proximity of tracks to specific regions of interest (Fig. 1F).These tools facilitate the discovery of distinct subpopulations or behaviors within the data, and also serve dual purposes: identifying actual clusters and categorizing data for comparative fingerprinting.
Importantly, PDF files of all plots and CSV files encapsulating all plot data are exported, enabling users to visualize and revisit the results using their preferred software platforms.Due to the platform's flexible design, we anticipate the addition of new analysis modules, both by our team and the user community.

Exploring T-cell migration using CellTracksColab
To showcase the capabilities of CellTracksColab, we first chose to reanalyze a small dataset of T-cells migrating on either Vascular cell adhesion protein 1 (VCAM) or Intercellular Adhesion Molecule 1 (ICAM), captured through brightfield microscopy (Fig. 2A) 1,17,18 .Automated cell tracking was achieved using StarDist and TrackMate algorithms 1 .The dataset encompasses ten videos spread across two conditions and three biological repeats.CellTracksColab compiled this dataset using Colab in a few seconds, incorporating 2,297 tracks and 38,852 tracked objects.
After the computation of additional metrics, we first evaluated the dataset's balance and variability across different fields of view.Although the dataset exhibited some condition imbalance, we opted against resampling due to its relatively small size (Fig. S2A).Intriguingly, the FOV-based clustering analysis unveiled an unexpected alignment between two FOVs from the ICAM condition with those from the VCAM condition, hinting at potential similarities in cell tracking patterns (Fig. S2B).The condition and repeat-based clustering analysis further corroborated this observation.Specifically, the analysis revealed that the ICAM second biological repeat displayed a clustering pattern remarkably similar to those observed in VCAM repeats (Fig. S2C).This analysis indicates that this particular ICAM biological repeat does not behave as the other two, providing valuable information.
We further utilized CellTracksColab to plot key track metrics, including mean speed and directionality of T-cell migration.Our analysis confirmed that T-cells exhibit slower and less directional movement on VCAM than on ICAM surfaces (Fig. 2B).To delve deeper, we employed UMAP for dimensionality reduction, followed by HDBSCAN clustering.This approach revealed the presence of at least five distinct behavioral clusters within the cell population, suggesting varied migration patterns among T-cells (Fig. 2C-D).A fingerprinting plot then provided insights into the distribution of ICAM and VCAM tracks across these clusters, highlighting differing proportions (Fig. 2E).A notable observation was the much higher percentage of ICAM tracks in cluster 3 compared to VCAM and a higher percentage of VCAM tracks in cluster 1 compared to ICAM.CellTracksColab generates a heatmap representing the Z-score of available track metrics for each cluster to facilitate rapid metric comparison across clusters (Fig. 2F).Cluster 3 comprises fast and more directional tracks.In contrast, cluster 1 primarily comprises very slow-migrating cells (Fig. 2G).Finally, we compared track metrics between the ICAM and VCAM conditions within specific clusters.Focusing solely on tracks in cluster 4 (a cluster composed of migrating cells), we observed that amongst motile cells, cells plated on ICAM migrated faster and tended to be more circular than those on VCAM (Fig. 2H).While we provide only brief examples here, we can delve deeper into the analysis, identify tracks belonging to each cluster, and match them back to the original video.Further analyses will depend on the user's interest in the biological phenomenon studied.This multifaceted analysis underscores CellTracksColab's utility in offering nuanced insights into cell migration dynamics under different conditions.(G) The 'track mean speed,' track 'directionality,' and 'mean (cell) circularity' metrics for each cluster are summarized in a Tukey boxplot format as in (B).(H) The 'track mean speed,' track 'directionality,' and 'mean (cell) circularity' metrics for each condition for Cluster 4 are summarized in a Tukey boxplot format as in (B).For all box plots, the vertical whiskers extend to data points within 1.5× the interquartile range, and the values for each track are shown as dots.Each biological replicate is displayed next to each other from R1 to R3 (left to right).Plot axes are limited to 10x the interquartile range.

Studying cancer cell migration using CellTracksColab
Next, we analyzed a new dataset of collectively migrating cancer cells (Fig. 3A).In this dataset, breast cancer cells expressing a CTRL shRNA or a shRNA targeting the filopodial protein MYO10 were allowed to migrate either beneath a collagen gel or within standard media (data describing the migration behavior of these cells in standard media was previously reported here 19 ).Automated cell tracking was achieved using StarDist and TrackMate algorithms 1 .This larger dataset encompasses 117 fields of view (videos) spread across four conditions and three biological repeats.CellTracksColab compiled this dataset in around 6 minutes using Colab, storing 49,268 tracks and 3,262,747 tracked objects.
As with the T-cell dataset, we first performed quality control steps after computing additional track metrics.In the case of the breast cancer cell dataset, our quality control revealed some challenges: First, the third biological repeat (R3) did not cluster with the others (Fig. S3A).Additionally, the dataset was unbalanced, with the third repeat contributing disproportionately more tracks (Fig. 3B).Together, this analysis could indicate an issue with this third biological repeat and signal that the experiments might need to be repeated a fourth time.In addition, given this imbalance, we deemed it imperative to resample the dataset to ensure that R3's data does not unduly influence the overall conclusions.To ensure the robustness of the resampling, CellTracksColab allows for performing a statistical comparison between the original and resampled data per condition and track metric.The outcomes of this comparison are succinctly visualized in a heatmap, providing a clear and accessible way to assess the effects of resampling on the dataset's overall distribution (Fig. S3B).Post-resampling, the dataset contains 1,337 tracks for each condition and repeat (Fig. S3C).In our analysis of the resampled dataset using CellTracksColab, we focused on key track metrics such as mean speed and directionality to elucidate patterns of collective migration.We find that shMYO10 cells migrate slower than control cells, a pattern that persists with cells migrating beneath the collagen gel (Fig. 3C and Fig. S4A).Intriguingly, a closer examination of individual tracks revealed that cells in the third biological repeat (R3) exhibited a faster movement, yet this did not alter the overall migration differences observed in the dataset (Fig. 3C).Without a collagen gel, we observed no significant differences in the directionality of migration between the conditions (Fig. 3C and Fig. S4B).However, MYO10-silenced cells displayed increased directionality under the collagen gel compared to control cells, which was an unexpected finding (Fig. 3C).
Further analysis utilizing 2D UMAP projections of the whole dataset revealed challenges in cluster generation likely due to the similarity in track characteristics, a common occurrence in collective migration (Fig. 3D).Nevertheless, by employing the Canberra distance, distinct clusters were successfully delineated (Fig. S4C).These clusters provided a clear 'fingerprint' for each condition, with cluster 2 highlighting key differences between conditions (Fig. 3E).Cluster 2 is characterized by tracks of low speed but high directionality (Fig. S4D).Within cluster 2, MYO10-silenced cells showed increased directionality compared to CTRL cells in the presence of a collagen gel (Fig. S4E).Additionally, CTRL cells in this cluster moved faster than their MYO10-silenced counterparts (Fig. S4F).
In the study of collective migration, a critical aspect to consider is the distinct behavior of leading cells compared to those positioned further from the leading edge 20 .To address this, we used the CellTracksColab spatial analysis module to measure each track's distance to the leading edge over time (Fig. 3F).Interestingly, overall, we observed no significant differences among the conditions in the cells' capability to target the leading edge (Fig. 3G and Fig. S4E).Yet, the 'Direction Movement' metric revealed a complex scenario: while, on average, cell distance to the leading edge does not change between the beginning and the end of the track, the data distribution was broad.Many tracks were found closer to the leading edge by the end of the tracking period, while others were found to be further away (Fig. 3G).
Delving deeper, CellTracksColab allowed us to segregate the tracks based on their maximum distance to the leading edge, distinguishing between tracks proximal to and distant from the monolayer edge.This stratified analysis unveiled that, across all conditions, cells closer to the leading edge exhibited more directional movement compared to those further away (Fig. 3H).Notably, in the presence of a collagen gel, silencing MYO10 resulted in more directional movement, irrespective of the cells' proximity to the leading edge (Fig. 3H).This example highlights how CellTracksColab can help extract spatial insights from tracking data.Furthermore, the spatial metrics derived from these analyses can enrich dimensionality reduction analyses, potentially helping unveil additional nuanced behaviors in tracking data.

Studying filopodia dynamics using CellTracksColab
In our final example, we aimed to showcase the versatility of CellTracksColab by exploring a filopodia dynamics dataset 21 , diverging from our previous focus on cell migration.This study involved U2OS cells expressing different MYO10 constructs, a protein-inducing filopodia formation that also accumulates at their tips 22 (Fig. 4A).We tracked MYO10 puncta in live cells to investigate the dynamics of filopodia induced by three MYO10 variants: the wild type (MYO10 WT ), a mutant lacking the MyTH4/FERM domain (MYO10 ΔFERM ), and a chimera (MYO10 TH ), where MYO10's FERM domain is replaced by that from TLN1 21 .The dataset encompasses three experimental conditions, each with three biological replicates and a total of 112 videos.Utilizing CellTracksColab in Colab, we efficiently compiled this dataset, which included 91,825 tracks and nearly 1.5 million tracked objects, in around 4 minutes.
We started the analysis by filtering out tracks lasting for less than 25 seconds, resulting in a refined dataset comprising 57,487 tracks and 1,377,019 objects.Utilizing UMAP coupled with clustering analysis (Fig. 4B and Fig. S5A), we identified several distinct clusters, providing a window into the intricate behaviors of filopodia (Fig. S5B).However, the fingerprint plot revealed a similar distribution of tracks across clusters within each experimental condition (Fig. 4C).Given our focus on differences between the MYO10 constructs, we did not extensively investigate individual clusters.
Quality control assessments highlighted that the biological repeats were not clustering cohesively and revealed an imbalance in the dataset across both repeats and conditions (Fig. S5C and S5D).Consequently, we resampled the data to ensure a more balanced representation before proceeding with the analysis of track metrics.
Post-resampling, we observed notable distinctions: MYO10 WT filopodia exhibited greater stability, evidenced by longer lifetimes (track duration) and slower speeds, compared to MYO10 ΔFERM and MYO10 TH filopodia (Fig. 4D).Interestingly, MYO10 ΔFERM filopodia had a larger area than MYO10 WT , and while MYO10 TH filopodia also showed a statistically significant difference in area from MYO10 WT (p-value < 0.001), the low Cohen's d value suggests a negligible practical difference between these conditions (Fig. 4D).This observation highlights the importance of using both Cohen's d and p-values when comparing conditions, as it provides a more nuanced understanding of the data.(B) 2D UMAP projection of the entire dataset, using all available track metrics for dimensionality reduction.Data points are color-coded based on the conditions.(C) A fingerprint plot showcasing the distribution percentage of each cluster across different conditions (clustering is shown in Fig. S5A).(D) The 'track mean speed', the 'track duration, the spot 'mean area', and the track 'spatial coverage' for each condition are summarized in Tukey boxplots.The effect size (d, Cohen's d value) and the statistical significance (p, p-values from randomization tests) between the MYO10 WT and the indicated conditions are displayed.For all box plots, the vertical whiskers extend to data points within 1.5× the interquartile range, and the values for each track are shown as dots.Each biological replicate is displayed next to each other from R1 to R3 (left to right).Plot axes are limited to 10x the interquartile range.

Discussion
Here, we introduced the CellTracksColab platform, a tool designed for the life sciences community that offers a user-friendly solution for tracking data analysis.CellTracksColab integrates several functionalities for tracking data analysis, including track visualization, population analysis, statistical assessments, and dimensionality reduction.
The easiest way to start using CellTracksColab is via the Google Colaboratory framework, which significantly simplifies access and overcomes common barriers such as complex software installations.Its intuitive graphical user interface effectively bridges the gap between sophisticated computational methods and researchers with limited programming skills, making it more inclusive.However, it is essential to acknowledge the limitations of the Google Colaboratory environment.One of the primary constraints is the limited runtime, as sessions in Colab are typically capped, which can interrupt longer analytical processes.Additionally, the computing power (especially the runtime RAM) and speed offered by the free version of Colab may not suffice for massive datasets.There are also concerns regarding data privacy.To mitigate these issues, CellTracksColab can operate locally via Jupyter notebooks.Running the platform locally enables users to utilize their computational resources, providing extended runtime, increased processing power, and better control over data privacy.However, in a standard Jupyter Notebook environment, the code is exposed by default, which might make the interface seem less streamlined compared to the encapsulated Colab version.For those who prefer the Colab interface but want to use their local machine's resources, connecting Google Colab to a local Python environment is a viable option.This hybrid approach leverages the familiar Colab interface while utilizing local computational power.We believe that these three modalities-Google Colaboratory, local Jupyter notebooks, and a hybrid local Colab connection-provide comprehensive options to accommodate the preferences and needs of most users.
Existing image repositories, such as the BioImage Archive 23 and the Image Data Resource 24 , demonstrate the feasibility and value of sharing microscopy data.Analyzed tracking datasets are also very valuable; they hold significant potential for reanalysis and meta-analysis and offer a more manageable alternative to storing raw video footage 25 .Moreover, they can serve the purpose of new machine-learning tracking algorithm development and benchmarking.Yet, their widespread sharing remains limited, and publicly available analyzed datasets are relatively scarce.This scarcity is partly due to the need for a standardized format for sharing tracking results.CellTracksColab partially addresses this challenge by adopting a unified format for storing tracking data and simplifying the sharing of tracking datasets.Additionally, the platform includes a streamlined notebook designed for easy loading, viewing, and replotting previously analyzed datasets.This feature enhances data analysis's transparency and promotes reproducibility, aligning with the FAIR (Findable, Accessible, Interoperable, and Reusable) principles in scientific research.
Despite its robust capabilities, CellTracksColab has certain limitations we aim to address in future updates.Currently, the platform has limited support for analyzing tracks in 3D space.While the platform can handle 3D + time datasets for metric computation, quality control, and dimensionality reduction tasks, it falls short in certain areas.Specifically, some of its analysis modules, including track visualization and spatial analysis, are optimized only for 2D + time datasets.In addition, CellTracksColab is not currently adapted for lineage tracing studies.Researchers focusing on lineage tracing may explore alternative tools specifically developed for such analyses 1,3,26 .
We also plan to enhance CellTracksColab by introducing additional analytical features.This includes the capability to examine time-series data, such as analyzing variations in fluorescent reporters within tracked objects over time, which could provide deeper insights into dynamic biological processes 27,28 .Furthermore, we intend to extend our data loader for other popular tracking tools, broadening the platform's compatibility and ease of integration with existing workflows.In conclusion, we believe that CellTracksColab represents a significant step forward in tracking data analysis in life sciences.Its user-friendly design and robust analytical capabilities allow researchers to explore and understand the complexities of biological motion and behavior.As we continue to develop and enhance CellTracksColab, we anticipate it becoming a useful tool in the life sciences toolkit, aiding in discovering and understanding new biological insights.

Implementation
CellTracksColab is implemented as a series of interactive Jupyter notebooks.The platform utilizes Python as its primary programming language, leveraging various libraries such as Pandas for data manipulation, Matplotlib, Seaborn for data visualization, and UMAP and HDBSCAN for dimensionality reduction and clustering analyses.The notebook's architecture facilitates ease of use while providing robust data analysis capabilities.The notebooks are structured to guide users through each step of the analysis process, from data import and preprocessing to advanced statistical and spatial analyses.Each notebook contains detailed instructions and documentation to assist users in customizing the analysis to their specific datasets.

Data and software availability
Multiple test datasets are available on Zenodo.They include the two test datasets that can be directly downloaded from within the CellTracksColab notebooks 29,30 .In addition, the three datasets showcased in this study, their tracking files, and the CellTracksColab results are also available on Zenodo [31][32][33] .
The code for CellTracksColab is publicly available under the MIT license, encouraging broad utilization and adaptation.CellTracksColab's GitHub repository serves as a dynamic platform for tracking the evolution of the code across various versions.Users are encouraged to report issues and suggest features directly through the GitHub interface.A stable version of the code and associated documentation is also archived on Zenodo 34 .

The T cell dataset.
The T cell dataset used is available on Zenodo 31 and has been detailed in previous publications 1,17,18 .In summary, Lab-Tek 8 chamber slides (ThermoFisher) were prepared by overnight coating with either 2 μg/mL ICAM-1 or VCAM-1 at a temperature of 4°C.Subsequently, activated primary mouse CD4+ T cells were cleansed and suspended in L-15 media, enriched with 2 mg/mL D-glucose.These T cells were then placed into the chamber slides and incubated for 20 minutes.Post-incubation, a gentle wash was performed to eliminate all unattached cells.The imaging process was conducted using a 10x phase contrast objective at 37°C, utilizing a Zeiss Axiovert 200M microscope equipped with an automated X-Y stage and a Roper EMCCD camera.Time-lapse imaging was executed at intervals of 1 minute over 10 minutes, employing SlideBook 6 software from Intelligent Imaging Innovations.Cells were automatically tracked using StarDist, directly called within TrackMate 1,15,35 .The StarDist model was trained using ZeroCostDL4Mic 11 and is publicly available on Zenodo 36 .This model generated excellent segmentation results on our test dataset (F1 score > 0.99).In TrackMate, the StarDist detector custom model (score threshold = 0.41 and overlap threshold = 0.5) and the Simple LAP tracker (linking max distance = 30 µm; gap closing max distance = 15 µm, gap closing max frame gap = 2 frames) were used.
In CellTracksColab, we conducted a dimensionality reduction analysis employing Uniform Manifold Approximation and Projection (UMAP).The UMAP settings were as follows: number of neighbors (n_neighbors) set to 10, minimum distance (min_dist) to 0, and number of dimensions (n_dimension) to 2. This analysis utilized an array of track metrics, including:

The breast cancer cell dataset.
The tracked breast cancer cell dataset is available on Zenodo 33 .In this experiment, approximately 50,000 shCTRL or shMYO10 lifeact-RFP DCIS.COM cells 19 were seeded into one well of an ibidi culture-insert 2 well pre-placed in a µ-Slide 8 well.The cells were cultured for 24 hours, after which the culture insert was removed to create a wound-healing assay setup.When appropriate, a fibrillar collagen gel (PureCol EZ Gel) was applied over the cells and allowed to polymerize for 30 minutes at 37°C.Standard culture media was added to all wells, and the cells were left to migrate/invade for two days 37 .Before live cell imaging, the cells were treated with 0.5 µM SiR-DNA (SiR-Hoechst, Tetu-bio) for two hours.Imaging was performed over 14 hours using a Marianas spinning-disk confocal microscope system.This system included a Yokogawa CSU-W1 scanning unit mounted on an inverted Zeiss Axio Observer Z1 microscope (Intelligent Imaging Innovations, Inc.).Imaging was conducted using a 20x (NA 0.8) air Plan Apochromat objective (Zeiss), and images were captured at 10-minute intervals.
Cell tracking was conducted using Fiji 15 and TrackMate 1 .The Stardist detector was employed to detect nuclei using the Stardist versatile model 35 .Tracks were created using the Kalman tracker (a maximum frame gap of 1, a Kalman search radius of 20 µm, and a linking maximum distance of 15 µm).Post-tracking, tracks were filtered so that each track had to contain more than six spots, ensuring a significant amount of data per track, and the total distance traveled by cells had to be greater than 89 µm.
In CellTracksColab, we conducted a dimensionality reduction analysis employing Uniform Manifold Approximation and Projection (UMAP).The UMAP settings were as follows: number of neighbors (n_neighbors) set to 10, minimum distance (min_dist) to 0, and number of dimensions (n_dimension) to 2. This analysis utilized an array of track metrics, including: The filopodia dataset.
The tracked filopodia dataset is available on Zenodo 32 and was previously described 21 .U2-OS cells expressing MYO10-GFP, a MYO10 MyTH/FERM deletion construct (EGFP-MYO10 ΔFERM ), or an MYO10/TLN1 chimera construct (EGFP-MYO10 TH ) were plated for at least 2 hours on fibronectin before the start of live imaging.Images were taken every 5 seconds at 37°C on an Airyscan microscope, using a 40x objective.MYO10 puncta were tracked using TrackMate using the custom Stardist detector and the simple LAP tracker (Linking max distance = 1 µm, Gap-closing max distance = 2 µm, Gap-closing max frame gap = 2 µm).The StarDist 2D model used was previously described 38 .Briefly, this model was trained for 200 epochs on 11 paired image patches [image dimensions: (512, 512), patch size: (512,512)] with a batch size of 2 and a mean absolute error (MAE) loss function, using the StarDist 2D ZeroCostDL4Mic notebook 11 .
In CellTracksColab, we conducted a dimensionality reduction analysis employing Uniform Manifold Approximation and Projection (UMAP).The UMAP settings were as follows: number of neighbors (n_neighbors) set to 10, minimum distance (min_dist) to 0, and number of dimensions (n_dimension) to 2. This analysis utilized an array of track metrics, including:

Figure 1 :
Figure 1: The CellTracksColab platform (A) Schematic representation of the CellTracksColab workflow.(B) Visualization of tracks in a CellTracksColab notebook.(C) Statistical analysis of track metrics using CellTracksColab.This figure shows the analysis of breast cancer cell migration (expressing CTRL shRNA or MYO10-targeting shRNA) in different environments beneath a collagen gel and standard media.The directionality metric is presented in a Tukey boxplot format.Vertical whiskers extend to data points within 1.5× the interquartile range.Each biological replicate is uniquely color-coded for clarity.Accompanying the plot are mirrored heatmaps that illustrate the effect size (Cohen's d value) and statistical significance (p-values from randomization tests) across various conditions.(D) Dimensionality reduction and clustering visualization in CellTracksColab.This panel displays a 2D t-SNE projection of the entire dataset, utilizing comprehensive track metrics for the analysis.Data points are color-coded to reflect cluster groups identified through HDBSCAN analysis on the t-SNE projection, providing insights into track characteristics and similarities.(E) Spatial clustering analysis using Ripley's L function and Monte Carlo simulations in CellTracksColab.This graph illustrates the spatial distribution of tracks, where a blue curve above the zero line indicates clustering at a specific radius in the field of view.The Monte Carlo simulation results are included to assess the statistical significance of the observed patterns.

Figure 3 :
Figure 3: Exploring cancer cell migration using CellTracksColab (A) MCF10DCIS.comlifeact-RFP cells, labeled with SiR-DNA, were recorded live using a spinning disk confocal microscope and tracked using StarDist and TrackMate.Detected nuclei and tracks (colors indicate track ID) are displayed.Scale bar: 100 µm.(B) This panel presents a stacked histogram showcasing the number of tracks for each biological repeat under different conditions.Each biological repeat is color-coded, and each histogram segment's specific number of tracks is annotated.

Figure 4 :
Figure 4: Exploring filopodia dynamics using CellTracksColab Subsequently, clustering analysis was performed using Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN).The parameters included clustering_data_source set to UMAP, min_samples at 20, min_cluster_size at 200, and the metric employed was Euclidean.