zl程序教程

您现在的位置是:首页 >  大数据

当前栏目

1000万数据对比ContainsAll实测

数据 对比 1000 实测
2023-06-13 09:15:00 时间

JDK版本java11

 public void timeoutReminder() {
        List<String> list = new ArrayList<>();

        List<String> list2 = new ArrayList<>();
        for (int i = 0; i < 10000000; i++) {
            Random random = new Random();
            list.add(getRandomString(random.nextInt(100)));
            list2.add(getRandomString(random.nextInt(100)));
        }
        StopWatch stopWatch = new StopWatch();
        stopWatch.start();
        list.containsAll(list2);
        stopWatch.stop();
        System.out.println(stopWatch.getTotalTimeSeconds());
        StopWatch stopWatch2 = new StopWatch();
        stopWatch2.start();
        CollectionUtils.containsAll(list, list2);
        stopWatch2.stop();
        System.out.println(stopWatch2.getTotalTimeSeconds());

    }
0.571
17.27

源码 java.util.List#containsAll

 public boolean containsAll(Collection<?> c) {
        Object[] es = getArray();
        int len = es.length;
        for (Object e : c) {
            if (indexOfRange(e, es, 0, len) < 0)
                return false;
        }
        return true;
    }
      private static int indexOfRange(Object o, Object[] es, int from, int to) {
        if (o == null) {
            for (int i = from; i < to; i++)
                if (es[i] == null)
                    return i;
        } else {
            for (int i = from; i < to; i++)
                if (o.equals(es[i]))
                    return i;
        }
        return -1;
    }

从源码中看到在循环内还有list.len次循环,时间复杂度为0(N2);

.collections4.CollectionUtils#containsAll

而在CollectionUtils中除了用set去重之外,还在遍历前中分别加入了过滤条件

 /**
     * Returns <code>true</code> iff all elements of {@code coll2} are also contained
     * in {@code coll1}. The cardinality of values in {@code coll2} is not taken into account,
     * which is the same behavior as {@link Collection#containsAll(Collection)}.
     * <p>
     * In other words, this method returns <code>true</code> iff the
     * {@link #intersection} of <i>coll1</i> and <i>coll2</i> has the same cardinality as
     * the set of unique values from {@code coll2}. In case {@code coll2} is empty, {@code true}
     * will be returned.
     * <p>
     * This method is intended as a replacement for {@link Collection#containsAll(Collection)}
     * with a guaranteed runtime complexity of {@code O(n + m)}. Depending on the type of
     * {@link Collection} provided, this method will be much faster than calling
     * {@link Collection#containsAll(Collection)} instead, though this will come at the
     * cost of an additional space complexity O(n).
     *
     * @param coll1  the first collection, must not be null
     * @param coll2  the second collection, must not be null
     * @return <code>true</code> iff the intersection of the collections has the same cardinality
     *   as the set of unique elements from the second collection
     * @since 4.0
     */
     public static boolean containsAll(final Collection<?> coll1, final Collection<?> coll2) {
        if (coll2.isEmpty()) {
            return true;
        } else {
            final Iterator<?> it = coll1.iterator();
            final Set<Object> elementsAlreadySeen = new HashSet<Object>();
            for (final Object nextElement : coll2) {
                if (elementsAlreadySeen.contains(nextElement)) {
                    continue;
                }

                boolean foundCurrentElement = false;
                while (it.hasNext()) {
                    final Object p = it.next();
                    elementsAlreadySeen.add(p);
                    if (nextElement == null ? p == null : nextElement.equals(p)) {
                        foundCurrentElement = true;
                        break;
                    }
                }

                if (foundCurrentElement) {
                    continue;
                } else {
                    return false;
                }
            }
            return true;
        }
    }

理论上在处理数据时应该是CollectionUtils的containsAll方法个更快的,但是实测的简单非对象存储数据随机数,反而list.containsAll更快,实际场景还是要实际分析的