**Figure 1**: Comparison of the 5-Gaussian dataset projection of four different t-SNE methods. a) t-SNE produced misaligned layouts all across four time frames. b) Equal-initialization t-SNE provides better visual consistency than t-SNE but there are still unnecessary movements of clusters. c) Dynamic t-SNE showed smoothing effect by distorting projections at t = 2 and 3. d) Joint t-SNE generated coherent and reliable projections that reﬂected the ground-truth transformations of clusters.

We present Joint t-Stochastic Neighbor Embedding (Joint t-SNE), a technique to generate comparable projections of multiple high-dimensional datasets. Although t-SNE has been widely employed to visualize high-dimensional datasets from various domains, it is limited to projecting a single dataset. When a series of high-dimensional datasets, such as datasets changing over time, is projected independently using t-SNE, misaligned layouts are obtained. Even items with identical features across datasets are projected to different locations, making the technique unsuitable for comparison tasks. To tackle this problem, we introduce edge similarity, which captures the similarities between two adjacent time frames based on the Graphlet Frequency Distribution (GFD). We then integrate a novel loss term into the t-SNE loss function, which we call vector constraints, to preserve the vectors between projected points across the projections, allowing these points to serve as visual landmarks for direct comparisons between projections. Using synthetic datasets whose ground-truth structures are known, we show that Joint t-SNE outperforms existing techniques, including Dynamic t-SNE, in terms of local coherence error, Kullback-Leibler divergence, and neighborhood preservation. We also showcase a real-world use case to visualize and compare the activation of different layers of a neural network.

Video:

**Figure 2**:
Technical Illustration of Joint t-SNE. Note that we only consider 3-node graphlets for simplicity. a) Some changes happened between X_{0} and X_{1} . Several points broke the neighborhood relationship with the original cluster. Joint t-SNE measures the similarity of local structures to ﬁnd such changes and computes edge similarities (S_{e12} > S_{e13} > S_{e14} ). b) Using edge similarity as the weight of the corresponding vector constraint, Joint t-SNE generates projection Y_{1} , which keeps the relative position of points in Y_{0} accordingly.

**Figure 3**:
Comparison of the MNIST dataset projection of three t-SNE techniques. Note that since the effect of long-range interference of Dynamic t-SNE was too serious in this case, we did not use its result for t = 0 as the initial projection for other methods as we did in other cases. a) Projecting each frame separately in Equal-initialization t-SNE could faithfully reveal the underlying structure. But when used for comparison tasks, the results can be misleading; for example, people can think that the red cluster for digit 2 changed and the purple cluster of digit 3 moved to the green cluster of digit 1, but the ground truth is the opposite. b) In Dynamic t-SNE, the changes between two frames are subtle even though there were substantial changes. These projections are too consistent, sacriﬁcing ﬁdelity. c) Joint t-SNE successfully detected the changes in topology while preserving the local structure of the red cluster for digit 2 where no change was made. d) Comparison of local subspace shows that Dynamic t-SNE created an artifact of digit 2 on the right side, whereas other methods did not.

**Figure 4**:
Comparison of the VGG dataset projections of four t-SNE techniques. a) and b) t-SNE and Equal-initialization t-SNE produced faithful but inconsistent projections. See the red and pink clusters moving around the center. c) Dynamic t-SNE produced projections that are too rigid to reﬂect abrupt changes in topology. For example, the green and light green clusters remain separated at the end, failing to escape from their initial positions. d) Joint t-SNE generated more faithful projections that are robust to such abrupt changes. See the points from a pair of classes (two classes with the same hue) gather as in the t-SNE and Equal-initialization t-SNE projections while providing visual consistency.

**Figure 5**:
Comparison of the effect of hyperparameters in Dynamic and Joint t-SNE. We used the result from Dynamic t-SNE as the projection of the ﬁrst time frame for ease of comparison. For Dynamic t-SNE, as λ increases from 0.01 to 0.1, the effect of long-range interference at t = 0 becomes more obvious as the red and purple clusters gets closer, which is a future change. The same is true for smoothing effect at t = 1. Joint t-SNE is robust to the change of γ; see the red and purple clusters fully overlap regardless of γ.

This work is supported by the grants of the NSFC (61772315, 61861136012), the Open Project Program of State Key Laboratory of Virtual Reality Technology and Systems, Beihang University (No.VRLAB2020C08), and the CAS grant (GJHZ1862).