generic: 6.6: drop backport patches

Drop all backport patches that are now included in kernel 6.6.

Signed-off-by: Weijie Gao <hackpascal@gmail.com>
This commit is contained in:
Weijie Gao 2024-01-04 01:47:10 +08:00 committed by Robert Marko
parent 8a9273d51e
commit 8c3892b1d7
339 changed files with 0 additions and 44789 deletions

View File

@ -1,352 +0,0 @@
From 8c20e2eb5f2a0175b774134685e4d7bd93e85ff8 Mon Sep 17 00:00:00 2001
From: Yu Zhao <yuzhao@google.com>
Date: Wed, 21 Dec 2022 21:18:59 -0700
Subject: [PATCH 01/19] UPSTREAM: mm: multi-gen LRU: rename lru_gen_struct to
lru_gen_folio
Patch series "mm: multi-gen LRU: memcg LRU", v3.
Overview
========
An memcg LRU is a per-node LRU of memcgs. It is also an LRU of LRUs,
since each node and memcg combination has an LRU of folios (see
mem_cgroup_lruvec()).
Its goal is to improve the scalability of global reclaim, which is
critical to system-wide memory overcommit in data centers. Note that
memcg reclaim is currently out of scope.
Its memory bloat is a pointer to each lruvec and negligible to each
pglist_data. In terms of traversing memcgs during global reclaim, it
improves the best-case complexity from O(n) to O(1) and does not affect
the worst-case complexity O(n). Therefore, on average, it has a sublinear
complexity in contrast to the current linear complexity.
The basic structure of an memcg LRU can be understood by an analogy to
the active/inactive LRU (of folios):
1. It has the young and the old (generations), i.e., the counterparts
to the active and the inactive;
2. The increment of max_seq triggers promotion, i.e., the counterpart
to activation;
3. Other events trigger similar operations, e.g., offlining an memcg
triggers demotion, i.e., the counterpart to deactivation.
In terms of global reclaim, it has two distinct features:
1. Sharding, which allows each thread to start at a random memcg (in
the old generation) and improves parallelism;
2. Eventual fairness, which allows direct reclaim to bail out at will
and reduces latency without affecting fairness over some time.
The commit message in patch 6 details the workflow:
https://lore.kernel.org/r/20221222041905.2431096-7-yuzhao@google.com/
The following is a simple test to quickly verify its effectiveness.
Test design:
1. Create multiple memcgs.
2. Each memcg contains a job (fio).
3. All jobs access the same amount of memory randomly.
4. The system does not experience global memory pressure.
5. Periodically write to the root memory.reclaim.
Desired outcome:
1. All memcgs have similar pgsteal counts, i.e., stddev(pgsteal)
over mean(pgsteal) is close to 0%.
2. The total pgsteal is close to the total requested through
memory.reclaim, i.e., sum(pgsteal) over sum(requested) is close
to 100%.
Actual outcome [1]:
MGLRU off MGLRU on
stddev(pgsteal) / mean(pgsteal) 75% 20%
sum(pgsteal) / sum(requested) 425% 95%
####################################################################
MEMCGS=128
for ((memcg = 0; memcg < $MEMCGS; memcg++)); do
mkdir /sys/fs/cgroup/memcg$memcg
done
start() {
echo $BASHPID > /sys/fs/cgroup/memcg$memcg/cgroup.procs
fio -name=memcg$memcg --numjobs=1 --ioengine=mmap \
--filename=/dev/zero --size=1920M --rw=randrw \
--rate=64m,64m --random_distribution=random \
--fadvise_hint=0 --time_based --runtime=10h \
--group_reporting --minimal
}
for ((memcg = 0; memcg < $MEMCGS; memcg++)); do
start &
done
sleep 600
for ((i = 0; i < 600; i++)); do
echo 256m >/sys/fs/cgroup/memory.reclaim
sleep 6
done
for ((memcg = 0; memcg < $MEMCGS; memcg++)); do
grep "pgsteal " /sys/fs/cgroup/memcg$memcg/memory.stat
done
####################################################################
[1]: This was obtained from running the above script (touches less
than 256GB memory) on an EPYC 7B13 with 512GB DRAM for over an
hour.
This patch (of 8):
The new name lru_gen_folio will be more distinct from the coming
lru_gen_memcg.
Link: https://lkml.kernel.org/r/20221222041905.2431096-1-yuzhao@google.com
Link: https://lkml.kernel.org/r/20221222041905.2431096-2-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael Larabel <Michael@MichaelLarabel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bug: 274865848
(cherry picked from commit 391655fe08d1f942359a11148aa9aaf3f99d6d6f)
Change-Id: I7df67e0e2435ba28f10eaa57d28d98b61a9210a6
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
include/linux/mm_inline.h | 4 ++--
include/linux/mmzone.h | 6 +++---
mm/vmscan.c | 34 +++++++++++++++++-----------------
mm/workingset.c | 4 ++--
4 files changed, 24 insertions(+), 24 deletions(-)
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -178,7 +178,7 @@ static inline void lru_gen_update_size(s
int zone = folio_zonenum(folio);
int delta = folio_nr_pages(folio);
enum lru_list lru = type * LRU_INACTIVE_FILE;
- struct lru_gen_struct *lrugen = &lruvec->lrugen;
+ struct lru_gen_folio *lrugen = &lruvec->lrugen;
VM_WARN_ON_ONCE(old_gen != -1 && old_gen >= MAX_NR_GENS);
VM_WARN_ON_ONCE(new_gen != -1 && new_gen >= MAX_NR_GENS);
@@ -224,7 +224,7 @@ static inline bool lru_gen_add_folio(str
int gen = folio_lru_gen(folio);
int type = folio_is_file_lru(folio);
int zone = folio_zonenum(folio);
- struct lru_gen_struct *lrugen = &lruvec->lrugen;
+ struct lru_gen_folio *lrugen = &lruvec->lrugen;
VM_WARN_ON_ONCE_FOLIO(gen != -1, folio);
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -404,7 +404,7 @@ enum {
* The number of pages in each generation is eventually consistent and therefore
* can be transiently negative when reset_batch_size() is pending.
*/
-struct lru_gen_struct {
+struct lru_gen_folio {
/* the aging increments the youngest generation number */
unsigned long max_seq;
/* the eviction increments the oldest generation numbers */
@@ -461,7 +461,7 @@ struct lru_gen_mm_state {
struct lru_gen_mm_walk {
/* the lruvec under reclaim */
struct lruvec *lruvec;
- /* unstable max_seq from lru_gen_struct */
+ /* unstable max_seq from lru_gen_folio */
unsigned long max_seq;
/* the next address within an mm to scan */
unsigned long next_addr;
@@ -524,7 +524,7 @@ struct lruvec {
unsigned long flags;
#ifdef CONFIG_LRU_GEN
/* evictable pages divided into generations */
- struct lru_gen_struct lrugen;
+ struct lru_gen_folio lrugen;
/* to concurrently iterate lru_gen_mm_list */
struct lru_gen_mm_state mm_state;
#endif
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3190,7 +3190,7 @@ static int get_nr_gens(struct lruvec *lr
static bool __maybe_unused seq_is_valid(struct lruvec *lruvec)
{
- /* see the comment on lru_gen_struct */
+ /* see the comment on lru_gen_folio */
return get_nr_gens(lruvec, LRU_GEN_FILE) >= MIN_NR_GENS &&
get_nr_gens(lruvec, LRU_GEN_FILE) <= get_nr_gens(lruvec, LRU_GEN_ANON) &&
get_nr_gens(lruvec, LRU_GEN_ANON) <= MAX_NR_GENS;
@@ -3596,7 +3596,7 @@ struct ctrl_pos {
static void read_ctrl_pos(struct lruvec *lruvec, int type, int tier, int gain,
struct ctrl_pos *pos)
{
- struct lru_gen_struct *lrugen = &lruvec->lrugen;
+ struct lru_gen_folio *lrugen = &lruvec->lrugen;
int hist = lru_hist_from_seq(lrugen->min_seq[type]);
pos->refaulted = lrugen->avg_refaulted[type][tier] +
@@ -3611,7 +3611,7 @@ static void read_ctrl_pos(struct lruvec
static void reset_ctrl_pos(struct lruvec *lruvec, int type, bool carryover)
{
int hist, tier;
- struct lru_gen_struct *lrugen = &lruvec->lrugen;
+ struct lru_gen_folio *lrugen = &lruvec->lrugen;
bool clear = carryover ? NR_HIST_GENS == 1 : NR_HIST_GENS > 1;
unsigned long seq = carryover ? lrugen->min_seq[type] : lrugen->max_seq + 1;
@@ -3688,7 +3688,7 @@ static int folio_update_gen(struct folio
static int folio_inc_gen(struct lruvec *lruvec, struct folio *folio, bool reclaiming)
{
int type = folio_is_file_lru(folio);
- struct lru_gen_struct *lrugen = &lruvec->lrugen;
+ struct lru_gen_folio *lrugen = &lruvec->lrugen;
int new_gen, old_gen = lru_gen_from_seq(lrugen->min_seq[type]);
unsigned long new_flags, old_flags = READ_ONCE(folio->flags);
@@ -3733,7 +3733,7 @@ static void update_batch_size(struct lru
static void reset_batch_size(struct lruvec *lruvec, struct lru_gen_mm_walk *walk)
{
int gen, type, zone;
- struct lru_gen_struct *lrugen = &lruvec->lrugen;
+ struct lru_gen_folio *lrugen = &lruvec->lrugen;
walk->batched = 0;
@@ -4250,7 +4250,7 @@ static bool inc_min_seq(struct lruvec *l
{
int zone;
int remaining = MAX_LRU_BATCH;
- struct lru_gen_struct *lrugen = &lruvec->lrugen;
+ struct lru_gen_folio *lrugen = &lruvec->lrugen;
int new_gen, old_gen = lru_gen_from_seq(lrugen->min_seq[type]);
if (type == LRU_GEN_ANON && !can_swap)
@@ -4286,7 +4286,7 @@ static bool try_to_inc_min_seq(struct lr
{
int gen, type, zone;
bool success = false;
- struct lru_gen_struct *lrugen = &lruvec->lrugen;
+ struct lru_gen_folio *lrugen = &lruvec->lrugen;
DEFINE_MIN_SEQ(lruvec);
VM_WARN_ON_ONCE(!seq_is_valid(lruvec));
@@ -4307,7 +4307,7 @@ next:
;
}
- /* see the comment on lru_gen_struct */
+ /* see the comment on lru_gen_folio */
if (can_swap) {
min_seq[LRU_GEN_ANON] = min(min_seq[LRU_GEN_ANON], min_seq[LRU_GEN_FILE]);
min_seq[LRU_GEN_FILE] = max(min_seq[LRU_GEN_ANON], lrugen->min_seq[LRU_GEN_FILE]);
@@ -4329,7 +4329,7 @@ static void inc_max_seq(struct lruvec *l
{
int prev, next;
int type, zone;
- struct lru_gen_struct *lrugen = &lruvec->lrugen;
+ struct lru_gen_folio *lrugen = &lruvec->lrugen;
restart:
spin_lock_irq(&lruvec->lru_lock);
@@ -4389,7 +4389,7 @@ static bool try_to_inc_max_seq(struct lr
bool success;
struct lru_gen_mm_walk *walk;
struct mm_struct *mm = NULL;
- struct lru_gen_struct *lrugen = &lruvec->lrugen;
+ struct lru_gen_folio *lrugen = &lruvec->lrugen;
VM_WARN_ON_ONCE(max_seq > READ_ONCE(lrugen->max_seq));
@@ -4454,7 +4454,7 @@ static bool should_run_aging(struct lruv
unsigned long old = 0;
unsigned long young = 0;
unsigned long total = 0;
- struct lru_gen_struct *lrugen = &lruvec->lrugen;
+ struct lru_gen_folio *lrugen = &lruvec->lrugen;
struct mem_cgroup *memcg = lruvec_memcg(lruvec);
for (type = !can_swap; type < ANON_AND_FILE; type++) {
@@ -4740,7 +4740,7 @@ static bool sort_folio(struct lruvec *lr
int delta = folio_nr_pages(folio);
int refs = folio_lru_refs(folio);
int tier = lru_tier_from_refs(refs);
- struct lru_gen_struct *lrugen = &lruvec->lrugen;
+ struct lru_gen_folio *lrugen = &lruvec->lrugen;
VM_WARN_ON_ONCE_FOLIO(gen >= MAX_NR_GENS, folio);
@@ -4848,7 +4848,7 @@ static int scan_folios(struct lruvec *lr
int scanned = 0;
int isolated = 0;
int remaining = MAX_LRU_BATCH;
- struct lru_gen_struct *lrugen = &lruvec->lrugen;
+ struct lru_gen_folio *lrugen = &lruvec->lrugen;
struct mem_cgroup *memcg = lruvec_memcg(lruvec);
VM_WARN_ON_ONCE(!list_empty(list));
@@ -5249,7 +5249,7 @@ done:
static bool __maybe_unused state_is_valid(struct lruvec *lruvec)
{
- struct lru_gen_struct *lrugen = &lruvec->lrugen;
+ struct lru_gen_folio *lrugen = &lruvec->lrugen;
if (lrugen->enabled) {
enum lru_list lru;
@@ -5531,7 +5531,7 @@ static void lru_gen_seq_show_full(struct
int i;
int type, tier;
int hist = lru_hist_from_seq(seq);
- struct lru_gen_struct *lrugen = &lruvec->lrugen;
+ struct lru_gen_folio *lrugen = &lruvec->lrugen;
for (tier = 0; tier < MAX_NR_TIERS; tier++) {
seq_printf(m, " %10d", tier);
@@ -5581,7 +5581,7 @@ static int lru_gen_seq_show(struct seq_f
unsigned long seq;
bool full = !debugfs_real_fops(m->file)->write;
struct lruvec *lruvec = v;
- struct lru_gen_struct *lrugen = &lruvec->lrugen;
+ struct lru_gen_folio *lrugen = &lruvec->lrugen;
int nid = lruvec_pgdat(lruvec)->node_id;
struct mem_cgroup *memcg = lruvec_memcg(lruvec);
DEFINE_MAX_SEQ(lruvec);
@@ -5835,7 +5835,7 @@ void lru_gen_init_lruvec(struct lruvec *
{
int i;
int gen, type, zone;
- struct lru_gen_struct *lrugen = &lruvec->lrugen;
+ struct lru_gen_folio *lrugen = &lruvec->lrugen;
lrugen->max_seq = MIN_NR_GENS + 1;
lrugen->enabled = lru_gen_enabled();
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -223,7 +223,7 @@ static void *lru_gen_eviction(struct fol
unsigned long token;
unsigned long min_seq;
struct lruvec *lruvec;
- struct lru_gen_struct *lrugen;
+ struct lru_gen_folio *lrugen;
int type = folio_is_file_lru(folio);
int delta = folio_nr_pages(folio);
int refs = folio_lru_refs(folio);
@@ -252,7 +252,7 @@ static void lru_gen_refault(struct folio
unsigned long token;
unsigned long min_seq;
struct lruvec *lruvec;
- struct lru_gen_struct *lrugen;
+ struct lru_gen_folio *lrugen;
struct mem_cgroup *memcg;
struct pglist_data *pgdat;
int type = folio_is_file_lru(folio);

View File

@ -1,192 +0,0 @@
From 14f9a7a15f3d1af351f30e0438fd747b7ac253b0 Mon Sep 17 00:00:00 2001
From: Yu Zhao <yuzhao@google.com>
Date: Wed, 21 Dec 2022 21:19:01 -0700
Subject: [PATCH 03/19] UPSTREAM: mm: multi-gen LRU: remove eviction fairness
safeguard
Recall that the eviction consumes the oldest generation: first it
bucket-sorts folios whose gen counters were updated by the aging and
reclaims the rest; then it increments lrugen->min_seq.
The current eviction fairness safeguard for global reclaim has a
dilemma: when there are multiple eligible memcgs, should it continue
or stop upon meeting the reclaim goal? If it continues, it overshoots
and increases direct reclaim latency; if it stops, it loses fairness
between memcgs it has taken memory away from and those it has yet to.
With memcg LRU, the eviction, while ensuring eventual fairness, will
stop upon meeting its goal. Therefore the current eviction fairness
safeguard for global reclaim will not be needed.
Note that memcg LRU only applies to global reclaim. For memcg reclaim,
the eviction will continue, even if it is overshooting. This becomes
unconditional due to code simplification.
Link: https://lkml.kernel.org/r/20221222041905.2431096-4-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael Larabel <Michael@MichaelLarabel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bug: 274865848
(cherry picked from commit a579086c99ed70cc4bfc104348dbe3dd8f2787e6)
Change-Id: I08ac1b3c90e29cafd0566785aaa4bcdb5db7d22c
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
mm/vmscan.c | 81 +++++++++++++++--------------------------------------
1 file changed, 23 insertions(+), 58 deletions(-)
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -448,6 +448,11 @@ static bool cgroup_reclaim(struct scan_c
return sc->target_mem_cgroup;
}
+static bool global_reclaim(struct scan_control *sc)
+{
+ return !sc->target_mem_cgroup || mem_cgroup_is_root(sc->target_mem_cgroup);
+}
+
/**
* writeback_throttling_sane - is the usual dirty throttling mechanism available?
* @sc: scan_control in question
@@ -498,6 +503,11 @@ static bool cgroup_reclaim(struct scan_c
return false;
}
+static bool global_reclaim(struct scan_control *sc)
+{
+ return true;
+}
+
static bool writeback_throttling_sane(struct scan_control *sc)
{
return true;
@@ -5005,8 +5015,7 @@ static int isolate_folios(struct lruvec
return scanned;
}
-static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swappiness,
- bool *need_swapping)
+static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swappiness)
{
int type;
int scanned;
@@ -5095,9 +5104,6 @@ retry:
goto retry;
}
- if (need_swapping && type == LRU_GEN_ANON)
- *need_swapping = true;
-
return scanned;
}
@@ -5136,67 +5142,26 @@ done:
return min_seq[!can_swap] + MIN_NR_GENS <= max_seq ? nr_to_scan : 0;
}
-static bool should_abort_scan(struct lruvec *lruvec, unsigned long seq,
- struct scan_control *sc, bool need_swapping)
+static unsigned long get_nr_to_reclaim(struct scan_control *sc)
{
- int i;
- DEFINE_MAX_SEQ(lruvec);
-
- if (!current_is_kswapd()) {
- /* age each memcg at most once to ensure fairness */
- if (max_seq - seq > 1)
- return true;
-
- /* over-swapping can increase allocation latency */
- if (sc->nr_reclaimed >= sc->nr_to_reclaim && need_swapping)
- return true;
-
- /* give this thread a chance to exit and free its memory */
- if (fatal_signal_pending(current)) {
- sc->nr_reclaimed += MIN_LRU_BATCH;
- return true;
- }
-
- if (cgroup_reclaim(sc))
- return false;
- } else if (sc->nr_reclaimed - sc->last_reclaimed < sc->nr_to_reclaim)
- return false;
-
- /* keep scanning at low priorities to ensure fairness */
- if (sc->priority > DEF_PRIORITY - 2)
- return false;
-
- /*
- * A minimum amount of work was done under global memory pressure. For
- * kswapd, it may be overshooting. For direct reclaim, the allocation
- * may succeed if all suitable zones are somewhat safe. In either case,
- * it's better to stop now, and restart later if necessary.
- */
- for (i = 0; i <= sc->reclaim_idx; i++) {
- unsigned long wmark;
- struct zone *zone = lruvec_pgdat(lruvec)->node_zones + i;
-
- if (!managed_zone(zone))
- continue;
-
- wmark = current_is_kswapd() ? high_wmark_pages(zone) : low_wmark_pages(zone);
- if (wmark > zone_page_state(zone, NR_FREE_PAGES))
- return false;
- }
+ /* don't abort memcg reclaim to ensure fairness */
+ if (!global_reclaim(sc))
+ return -1;
- sc->nr_reclaimed += MIN_LRU_BATCH;
+ /* discount the previous progress for kswapd */
+ if (current_is_kswapd())
+ return sc->nr_to_reclaim + sc->last_reclaimed;
- return true;
+ return max(sc->nr_to_reclaim, compact_gap(sc->order));
}
static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
{
struct blk_plug plug;
bool need_aging = false;
- bool need_swapping = false;
unsigned long scanned = 0;
unsigned long reclaimed = sc->nr_reclaimed;
- DEFINE_MAX_SEQ(lruvec);
+ unsigned long nr_to_reclaim = get_nr_to_reclaim(sc);
lru_add_drain();
@@ -5220,7 +5185,7 @@ static void lru_gen_shrink_lruvec(struct
if (!nr_to_scan)
goto done;
- delta = evict_folios(lruvec, sc, swappiness, &need_swapping);
+ delta = evict_folios(lruvec, sc, swappiness);
if (!delta)
goto done;
@@ -5228,7 +5193,7 @@ static void lru_gen_shrink_lruvec(struct
if (scanned >= nr_to_scan)
break;
- if (should_abort_scan(lruvec, max_seq, sc, need_swapping))
+ if (sc->nr_reclaimed >= nr_to_reclaim)
break;
cond_resched();
@@ -5678,7 +5643,7 @@ static int run_eviction(struct lruvec *l
if (sc->nr_reclaimed >= nr_to_reclaim)
return 0;
- if (!evict_folios(lruvec, sc, swappiness, NULL))
+ if (!evict_folios(lruvec, sc, swappiness))
return 0;
cond_resched();

View File

@ -1,294 +0,0 @@
From f3c93d2e37a3c56593d7ccf4f4bcf1b58426fdd8 Mon Sep 17 00:00:00 2001
From: Yu Zhao <yuzhao@google.com>
Date: Wed, 21 Dec 2022 21:19:02 -0700
Subject: [PATCH 04/19] BACKPORT: mm: multi-gen LRU: remove aging fairness
safeguard
Recall that the aging produces the youngest generation: first it scans
for accessed folios and updates their gen counters; then it increments
lrugen->max_seq.
The current aging fairness safeguard for kswapd uses two passes to
ensure the fairness to multiple eligible memcgs. On the first pass,
which is shared with the eviction, it checks whether all eligible
memcgs are low on cold folios. If so, it requires a second pass, on
which it ages all those memcgs at the same time.
With memcg LRU, the aging, while ensuring eventual fairness, will run
when necessary. Therefore the current aging fairness safeguard for
kswapd will not be needed.
Note that memcg LRU only applies to global reclaim. For memcg reclaim,
the aging can be unfair to different memcgs, i.e., their
lrugen->max_seq can be incremented at different paces.
Link: https://lkml.kernel.org/r/20221222041905.2431096-5-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael Larabel <Michael@MichaelLarabel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bug: 274865848
(cherry picked from commit 7348cc91821b0cb24dfb00e578047f68299a50ab)
[TJ: Resolved conflicts with older function signatures for
min_cgroup_below_min / min_cgroup_below_low]
Change-Id: I6e36ecfbaaefbc0a56d9a9d5d7cbe404ed7f57a5
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
mm/vmscan.c | 126 ++++++++++++++++++++++++----------------------------
1 file changed, 59 insertions(+), 67 deletions(-)
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -136,7 +136,6 @@ struct scan_control {
#ifdef CONFIG_LRU_GEN
/* help kswapd make better choices among multiple memcgs */
- unsigned int memcgs_need_aging:1;
unsigned long last_reclaimed;
#endif
@@ -4457,7 +4456,7 @@ done:
return true;
}
-static bool should_run_aging(struct lruvec *lruvec, unsigned long max_seq, unsigned long *min_seq,
+static bool should_run_aging(struct lruvec *lruvec, unsigned long max_seq,
struct scan_control *sc, bool can_swap, unsigned long *nr_to_scan)
{
int gen, type, zone;
@@ -4466,6 +4465,13 @@ static bool should_run_aging(struct lruv
unsigned long total = 0;
struct lru_gen_folio *lrugen = &lruvec->lrugen;
struct mem_cgroup *memcg = lruvec_memcg(lruvec);
+ DEFINE_MIN_SEQ(lruvec);
+
+ /* whether this lruvec is completely out of cold folios */
+ if (min_seq[!can_swap] + MIN_NR_GENS > max_seq) {
+ *nr_to_scan = 0;
+ return true;
+ }
for (type = !can_swap; type < ANON_AND_FILE; type++) {
unsigned long seq;
@@ -4494,8 +4500,6 @@ static bool should_run_aging(struct lruv
* stalls when the number of generations reaches MIN_NR_GENS. Hence, the
* ideal number of generations is MIN_NR_GENS+1.
*/
- if (min_seq[!can_swap] + MIN_NR_GENS > max_seq)
- return true;
if (min_seq[!can_swap] + MIN_NR_GENS < max_seq)
return false;
@@ -4514,40 +4518,54 @@ static bool should_run_aging(struct lruv
return false;
}
-static bool age_lruvec(struct lruvec *lruvec, struct scan_control *sc, unsigned long min_ttl)
+static bool lruvec_is_sizable(struct lruvec *lruvec, struct scan_control *sc)
{
- bool need_aging;
- unsigned long nr_to_scan;
- int swappiness = get_swappiness(lruvec, sc);
+ int gen, type, zone;
+ unsigned long total = 0;
+ bool can_swap = get_swappiness(lruvec, sc);
+ struct lru_gen_folio *lrugen = &lruvec->lrugen;
struct mem_cgroup *memcg = lruvec_memcg(lruvec);
DEFINE_MAX_SEQ(lruvec);
DEFINE_MIN_SEQ(lruvec);
- VM_WARN_ON_ONCE(sc->memcg_low_reclaim);
+ for (type = !can_swap; type < ANON_AND_FILE; type++) {
+ unsigned long seq;
- mem_cgroup_calculate_protection(NULL, memcg);
+ for (seq = min_seq[type]; seq <= max_seq; seq++) {
+ gen = lru_gen_from_seq(seq);
- if (mem_cgroup_below_min(memcg))
- return false;
+ for (zone = 0; zone < MAX_NR_ZONES; zone++)
+ total += max(READ_ONCE(lrugen->nr_pages[gen][type][zone]), 0L);
+ }
+ }
- need_aging = should_run_aging(lruvec, max_seq, min_seq, sc, swappiness, &nr_to_scan);
+ /* whether the size is big enough to be helpful */
+ return mem_cgroup_online(memcg) ? (total >> sc->priority) : total;
+}
- if (min_ttl) {
- int gen = lru_gen_from_seq(min_seq[LRU_GEN_FILE]);
- unsigned long birth = READ_ONCE(lruvec->lrugen.timestamps[gen]);
+static bool lruvec_is_reclaimable(struct lruvec *lruvec, struct scan_control *sc,
+ unsigned long min_ttl)
+{
+ int gen;
+ unsigned long birth;
+ struct mem_cgroup *memcg = lruvec_memcg(lruvec);
+ DEFINE_MIN_SEQ(lruvec);
- if (time_is_after_jiffies(birth + min_ttl))
- return false;
+ VM_WARN_ON_ONCE(sc->memcg_low_reclaim);
- /* the size is likely too small to be helpful */
- if (!nr_to_scan && sc->priority != DEF_PRIORITY)
- return false;
- }
+ /* see the comment on lru_gen_folio */
+ gen = lru_gen_from_seq(min_seq[LRU_GEN_FILE]);
+ birth = READ_ONCE(lruvec->lrugen.timestamps[gen]);
- if (need_aging)
- try_to_inc_max_seq(lruvec, max_seq, sc, swappiness, false);
+ if (time_is_after_jiffies(birth + min_ttl))
+ return false;
- return true;
+ if (!lruvec_is_sizable(lruvec, sc))
+ return false;
+
+ mem_cgroup_calculate_protection(NULL, memcg);
+
+ return !mem_cgroup_below_min(memcg);
}
/* to protect the working set of the last N jiffies */
@@ -4556,46 +4574,32 @@ static unsigned long lru_gen_min_ttl __r
static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_control *sc)
{
struct mem_cgroup *memcg;
- bool success = false;
unsigned long min_ttl = READ_ONCE(lru_gen_min_ttl);
VM_WARN_ON_ONCE(!current_is_kswapd());
sc->last_reclaimed = sc->nr_reclaimed;
- /*
- * To reduce the chance of going into the aging path, which can be
- * costly, optimistically skip it if the flag below was cleared in the
- * eviction path. This improves the overall performance when multiple
- * memcgs are available.
- */
- if (!sc->memcgs_need_aging) {
- sc->memcgs_need_aging = true;
+ /* check the order to exclude compaction-induced reclaim */
+ if (!min_ttl || sc->order || sc->priority == DEF_PRIORITY)
return;
- }
-
- set_mm_walk(pgdat);
memcg = mem_cgroup_iter(NULL, NULL, NULL);
do {
struct lruvec *lruvec = mem_cgroup_lruvec(memcg, pgdat);
- if (age_lruvec(lruvec, sc, min_ttl))
- success = true;
+ if (lruvec_is_reclaimable(lruvec, sc, min_ttl)) {
+ mem_cgroup_iter_break(NULL, memcg);
+ return;
+ }
cond_resched();
} while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)));
- clear_mm_walk();
-
- /* check the order to exclude compaction-induced reclaim */
- if (success || !min_ttl || sc->order)
- return;
-
/*
* The main goal is to OOM kill if every generation from all memcgs is
* younger than min_ttl. However, another possibility is all memcgs are
- * either below min or empty.
+ * either too small or below min.
*/
if (mutex_trylock(&oom_lock)) {
struct oom_control oc = {
@@ -5113,33 +5117,27 @@ retry:
* reclaim.
*/
static unsigned long get_nr_to_scan(struct lruvec *lruvec, struct scan_control *sc,
- bool can_swap, bool *need_aging)
+ bool can_swap)
{
unsigned long nr_to_scan;
struct mem_cgroup *memcg = lruvec_memcg(lruvec);
DEFINE_MAX_SEQ(lruvec);
- DEFINE_MIN_SEQ(lruvec);
if (mem_cgroup_below_min(memcg) ||
(mem_cgroup_below_low(memcg) && !sc->memcg_low_reclaim))
return 0;
- *need_aging = should_run_aging(lruvec, max_seq, min_seq, sc, can_swap, &nr_to_scan);
- if (!*need_aging)
+ if (!should_run_aging(lruvec, max_seq, sc, can_swap, &nr_to_scan))
return nr_to_scan;
/* skip the aging path at the default priority */
if (sc->priority == DEF_PRIORITY)
- goto done;
+ return nr_to_scan;
- /* leave the work to lru_gen_age_node() */
- if (current_is_kswapd())
- return 0;
+ try_to_inc_max_seq(lruvec, max_seq, sc, can_swap, false);
- if (try_to_inc_max_seq(lruvec, max_seq, sc, can_swap, false))
- return nr_to_scan;
-done:
- return min_seq[!can_swap] + MIN_NR_GENS <= max_seq ? nr_to_scan : 0;
+ /* skip this lruvec as it's low on cold folios */
+ return 0;
}
static unsigned long get_nr_to_reclaim(struct scan_control *sc)
@@ -5158,9 +5156,7 @@ static unsigned long get_nr_to_reclaim(s
static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
{
struct blk_plug plug;
- bool need_aging = false;
unsigned long scanned = 0;
- unsigned long reclaimed = sc->nr_reclaimed;
unsigned long nr_to_reclaim = get_nr_to_reclaim(sc);
lru_add_drain();
@@ -5181,13 +5177,13 @@ static void lru_gen_shrink_lruvec(struct
else
swappiness = 0;
- nr_to_scan = get_nr_to_scan(lruvec, sc, swappiness, &need_aging);
+ nr_to_scan = get_nr_to_scan(lruvec, sc, swappiness);
if (!nr_to_scan)
- goto done;
+ break;
delta = evict_folios(lruvec, sc, swappiness);
if (!delta)
- goto done;
+ break;
scanned += delta;
if (scanned >= nr_to_scan)
@@ -5199,10 +5195,6 @@ static void lru_gen_shrink_lruvec(struct
cond_resched();
}
- /* see the comment in lru_gen_age_node() */
- if (sc->nr_reclaimed - reclaimed >= MIN_LRU_BATCH && !need_aging)
- sc->memcgs_need_aging = false;
-done:
clear_mm_walk();
blk_finish_plug(&plug);

View File

@ -1,166 +0,0 @@
From eca3858631e0cbad2ca6e40f788892749428e4cb Mon Sep 17 00:00:00 2001
From: Yu Zhao <yuzhao@google.com>
Date: Wed, 21 Dec 2022 21:19:03 -0700
Subject: [PATCH 05/19] UPSTREAM: mm: multi-gen LRU: shuffle should_run_aging()
Move should_run_aging() next to its only caller left.
Link: https://lkml.kernel.org/r/20221222041905.2431096-6-yuzhao@google.com
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael Larabel <Michael@MichaelLarabel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bug: 274865848
(cherry picked from commit 77d4459a4a1a472b7309e475f962dda87d950abd)
Signed-off-by: T.J. Mercier <tjmercier@google.com>
Change-Id: I3b0383fe16b93a783b4d8c0b3a0b325160392576
Signed-off-by: Yu Zhao <yuzhao@google.com>
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
mm/vmscan.c | 124 ++++++++++++++++++++++++++--------------------------
1 file changed, 62 insertions(+), 62 deletions(-)
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4456,68 +4456,6 @@ done:
return true;
}
-static bool should_run_aging(struct lruvec *lruvec, unsigned long max_seq,
- struct scan_control *sc, bool can_swap, unsigned long *nr_to_scan)
-{
- int gen, type, zone;
- unsigned long old = 0;
- unsigned long young = 0;
- unsigned long total = 0;
- struct lru_gen_folio *lrugen = &lruvec->lrugen;
- struct mem_cgroup *memcg = lruvec_memcg(lruvec);
- DEFINE_MIN_SEQ(lruvec);
-
- /* whether this lruvec is completely out of cold folios */
- if (min_seq[!can_swap] + MIN_NR_GENS > max_seq) {
- *nr_to_scan = 0;
- return true;
- }
-
- for (type = !can_swap; type < ANON_AND_FILE; type++) {
- unsigned long seq;
-
- for (seq = min_seq[type]; seq <= max_seq; seq++) {
- unsigned long size = 0;
-
- gen = lru_gen_from_seq(seq);
-
- for (zone = 0; zone < MAX_NR_ZONES; zone++)
- size += max(READ_ONCE(lrugen->nr_pages[gen][type][zone]), 0L);
-
- total += size;
- if (seq == max_seq)
- young += size;
- else if (seq + MIN_NR_GENS == max_seq)
- old += size;
- }
- }
-
- /* try to scrape all its memory if this memcg was deleted */
- *nr_to_scan = mem_cgroup_online(memcg) ? (total >> sc->priority) : total;
-
- /*
- * The aging tries to be lazy to reduce the overhead, while the eviction
- * stalls when the number of generations reaches MIN_NR_GENS. Hence, the
- * ideal number of generations is MIN_NR_GENS+1.
- */
- if (min_seq[!can_swap] + MIN_NR_GENS < max_seq)
- return false;
-
- /*
- * It's also ideal to spread pages out evenly, i.e., 1/(MIN_NR_GENS+1)
- * of the total number of pages for each generation. A reasonable range
- * for this average portion is [1/MIN_NR_GENS, 1/(MIN_NR_GENS+2)]. The
- * aging cares about the upper bound of hot pages, while the eviction
- * cares about the lower bound of cold pages.
- */
- if (young * MIN_NR_GENS > total)
- return true;
- if (old * (MIN_NR_GENS + 2) < total)
- return true;
-
- return false;
-}
-
static bool lruvec_is_sizable(struct lruvec *lruvec, struct scan_control *sc)
{
int gen, type, zone;
@@ -5111,6 +5049,68 @@ retry:
return scanned;
}
+static bool should_run_aging(struct lruvec *lruvec, unsigned long max_seq,
+ struct scan_control *sc, bool can_swap, unsigned long *nr_to_scan)
+{
+ int gen, type, zone;
+ unsigned long old = 0;
+ unsigned long young = 0;
+ unsigned long total = 0;
+ struct lru_gen_folio *lrugen = &lruvec->lrugen;
+ struct mem_cgroup *memcg = lruvec_memcg(lruvec);
+ DEFINE_MIN_SEQ(lruvec);
+
+ /* whether this lruvec is completely out of cold folios */
+ if (min_seq[!can_swap] + MIN_NR_GENS > max_seq) {
+ *nr_to_scan = 0;
+ return true;
+ }
+
+ for (type = !can_swap; type < ANON_AND_FILE; type++) {
+ unsigned long seq;
+
+ for (seq = min_seq[type]; seq <= max_seq; seq++) {
+ unsigned long size = 0;
+
+ gen = lru_gen_from_seq(seq);
+
+ for (zone = 0; zone < MAX_NR_ZONES; zone++)
+ size += max(READ_ONCE(lrugen->nr_pages[gen][type][zone]), 0L);
+
+ total += size;
+ if (seq == max_seq)
+ young += size;
+ else if (seq + MIN_NR_GENS == max_seq)
+ old += size;
+ }
+ }
+
+ /* try to scrape all its memory if this memcg was deleted */
+ *nr_to_scan = mem_cgroup_online(memcg) ? (total >> sc->priority) : total;
+
+ /*
+ * The aging tries to be lazy to reduce the overhead, while the eviction
+ * stalls when the number of generations reaches MIN_NR_GENS. Hence, the
+ * ideal number of generations is MIN_NR_GENS+1.
+ */
+ if (min_seq[!can_swap] + MIN_NR_GENS < max_seq)
+ return false;
+
+ /*
+ * It's also ideal to spread pages out evenly, i.e., 1/(MIN_NR_GENS+1)
+ * of the total number of pages for each generation. A reasonable range
+ * for this average portion is [1/MIN_NR_GENS, 1/(MIN_NR_GENS+2)]. The
+ * aging cares about the upper bound of hot pages, while the eviction
+ * cares about the lower bound of cold pages.
+ */
+ if (young * MIN_NR_GENS > total)
+ return true;
+ if (old * (MIN_NR_GENS + 2) < total)
+ return true;
+
+ return false;
+}
+
/*
* For future optimizations:
* 1. Defer try_to_inc_max_seq() to workqueues to reduce latency for memcg

View File

@ -1,876 +0,0 @@
From 8ee8571e47aa75221e5fbd4c9c7802fc4244c346 Mon Sep 17 00:00:00 2001
From: Yu Zhao <yuzhao@google.com>
Date: Wed, 21 Dec 2022 21:19:04 -0700
Subject: [PATCH 06/19] BACKPORT: mm: multi-gen LRU: per-node lru_gen_folio
lists
For each node, memcgs are divided into two generations: the old and
the young. For each generation, memcgs are randomly sharded into
multiple bins to improve scalability. For each bin, an RCU hlist_nulls
is virtually divided into three segments: the head, the tail and the
default.
An onlining memcg is added to the tail of a random bin in the old
generation. The eviction starts at the head of a random bin in the old
generation. The per-node memcg generation counter, whose reminder (mod
2) indexes the old generation, is incremented when all its bins become
empty.
There are four operations:
1. MEMCG_LRU_HEAD, which moves an memcg to the head of a random bin in
its current generation (old or young) and updates its "seg" to
"head";
2. MEMCG_LRU_TAIL, which moves an memcg to the tail of a random bin in
its current generation (old or young) and updates its "seg" to
"tail";
3. MEMCG_LRU_OLD, which moves an memcg to the head of a random bin in
the old generation, updates its "gen" to "old" and resets its "seg"
to "default";
4. MEMCG_LRU_YOUNG, which moves an memcg to the tail of a random bin
in the young generation, updates its "gen" to "young" and resets
its "seg" to "default".
The events that trigger the above operations are:
1. Exceeding the soft limit, which triggers MEMCG_LRU_HEAD;
2. The first attempt to reclaim an memcg below low, which triggers
MEMCG_LRU_TAIL;
3. The first attempt to reclaim an memcg below reclaimable size
threshold, which triggers MEMCG_LRU_TAIL;
4. The second attempt to reclaim an memcg below reclaimable size
threshold, which triggers MEMCG_LRU_YOUNG;
5. Attempting to reclaim an memcg below min, which triggers
MEMCG_LRU_YOUNG;
6. Finishing the aging on the eviction path, which triggers
MEMCG_LRU_YOUNG;
7. Offlining an memcg, which triggers MEMCG_LRU_OLD.
Note that memcg LRU only applies to global reclaim, and the
round-robin incrementing of their max_seq counters ensures the
eventual fairness to all eligible memcgs. For memcg reclaim, it still
relies on mem_cgroup_iter().
Link: https://lkml.kernel.org/r/20221222041905.2431096-7-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael Larabel <Michael@MichaelLarabel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bug: 274865848
(cherry picked from commit e4dde56cd208674ce899b47589f263499e5b8cdc)
[TJ: Resolved conflicts with older function signatures for
min_cgroup_below_min / min_cgroup_below_low and includes]
Change-Id: Idc8a0f635e035d72dd911f807d1224cb47cbd655
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
include/linux/memcontrol.h | 10 +
include/linux/mm_inline.h | 17 ++
include/linux/mmzone.h | 117 +++++++++++-
mm/memcontrol.c | 16 ++
mm/page_alloc.c | 1 +
mm/vmscan.c | 374 +++++++++++++++++++++++++++++++++----
6 files changed, 500 insertions(+), 35 deletions(-)
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -795,6 +795,11 @@ static inline void obj_cgroup_put(struct
percpu_ref_put(&objcg->refcnt);
}
+static inline bool mem_cgroup_tryget(struct mem_cgroup *memcg)
+{
+ return !memcg || css_tryget(&memcg->css);
+}
+
static inline void mem_cgroup_put(struct mem_cgroup *memcg)
{
if (memcg)
@@ -1295,6 +1300,11 @@ static inline void obj_cgroup_put(struct
{
}
+static inline bool mem_cgroup_tryget(struct mem_cgroup *memcg)
+{
+ return true;
+}
+
static inline void mem_cgroup_put(struct mem_cgroup *memcg)
{
}
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -122,6 +122,18 @@ static inline bool lru_gen_in_fault(void
return current->in_lru_fault;
}
+#ifdef CONFIG_MEMCG
+static inline int lru_gen_memcg_seg(struct lruvec *lruvec)
+{
+ return READ_ONCE(lruvec->lrugen.seg);
+}
+#else
+static inline int lru_gen_memcg_seg(struct lruvec *lruvec)
+{
+ return 0;
+}
+#endif
+
static inline int lru_gen_from_seq(unsigned long seq)
{
return seq % MAX_NR_GENS;
@@ -302,6 +314,11 @@ static inline bool lru_gen_in_fault(void
return false;
}
+static inline int lru_gen_memcg_seg(struct lruvec *lruvec)
+{
+ return 0;
+}
+
static inline bool lru_gen_add_folio(struct lruvec *lruvec, struct folio *folio, bool reclaiming)
{
return false;
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -7,6 +7,7 @@
#include <linux/spinlock.h>
#include <linux/list.h>
+#include <linux/list_nulls.h>
#include <linux/wait.h>
#include <linux/bitops.h>
#include <linux/cache.h>
@@ -367,6 +368,15 @@ struct page_vma_mapped_walk;
#define LRU_GEN_MASK ((BIT(LRU_GEN_WIDTH) - 1) << LRU_GEN_PGOFF)
#define LRU_REFS_MASK ((BIT(LRU_REFS_WIDTH) - 1) << LRU_REFS_PGOFF)
+/* see the comment on MEMCG_NR_GENS */
+enum {
+ MEMCG_LRU_NOP,
+ MEMCG_LRU_HEAD,
+ MEMCG_LRU_TAIL,
+ MEMCG_LRU_OLD,
+ MEMCG_LRU_YOUNG,
+};
+
#ifdef CONFIG_LRU_GEN
enum {
@@ -426,6 +436,14 @@ struct lru_gen_folio {
atomic_long_t refaulted[NR_HIST_GENS][ANON_AND_FILE][MAX_NR_TIERS];
/* whether the multi-gen LRU is enabled */
bool enabled;
+#ifdef CONFIG_MEMCG
+ /* the memcg generation this lru_gen_folio belongs to */
+ u8 gen;
+ /* the list segment this lru_gen_folio belongs to */
+ u8 seg;
+ /* per-node lru_gen_folio list for global reclaim */
+ struct hlist_nulls_node list;
+#endif
};
enum {
@@ -479,12 +497,87 @@ void lru_gen_init_lruvec(struct lruvec *
void lru_gen_look_around(struct page_vma_mapped_walk *pvmw);
#ifdef CONFIG_MEMCG
+
+/*
+ * For each node, memcgs are divided into two generations: the old and the
+ * young. For each generation, memcgs are randomly sharded into multiple bins
+ * to improve scalability. For each bin, the hlist_nulls is virtually divided
+ * into three segments: the head, the tail and the default.
+ *
+ * An onlining memcg is added to the tail of a random bin in the old generation.
+ * The eviction starts at the head of a random bin in the old generation. The
+ * per-node memcg generation counter, whose reminder (mod MEMCG_NR_GENS) indexes
+ * the old generation, is incremented when all its bins become empty.
+ *
+ * There are four operations:
+ * 1. MEMCG_LRU_HEAD, which moves an memcg to the head of a random bin in its
+ * current generation (old or young) and updates its "seg" to "head";
+ * 2. MEMCG_LRU_TAIL, which moves an memcg to the tail of a random bin in its
+ * current generation (old or young) and updates its "seg" to "tail";
+ * 3. MEMCG_LRU_OLD, which moves an memcg to the head of a random bin in the old
+ * generation, updates its "gen" to "old" and resets its "seg" to "default";
+ * 4. MEMCG_LRU_YOUNG, which moves an memcg to the tail of a random bin in the
+ * young generation, updates its "gen" to "young" and resets its "seg" to
+ * "default".
+ *
+ * The events that trigger the above operations are:
+ * 1. Exceeding the soft limit, which triggers MEMCG_LRU_HEAD;
+ * 2. The first attempt to reclaim an memcg below low, which triggers
+ * MEMCG_LRU_TAIL;
+ * 3. The first attempt to reclaim an memcg below reclaimable size threshold,
+ * which triggers MEMCG_LRU_TAIL;
+ * 4. The second attempt to reclaim an memcg below reclaimable size threshold,
+ * which triggers MEMCG_LRU_YOUNG;
+ * 5. Attempting to reclaim an memcg below min, which triggers MEMCG_LRU_YOUNG;
+ * 6. Finishing the aging on the eviction path, which triggers MEMCG_LRU_YOUNG;
+ * 7. Offlining an memcg, which triggers MEMCG_LRU_OLD.
+ *
+ * Note that memcg LRU only applies to global reclaim, and the round-robin
+ * incrementing of their max_seq counters ensures the eventual fairness to all
+ * eligible memcgs. For memcg reclaim, it still relies on mem_cgroup_iter().
+ */
+#define MEMCG_NR_GENS 2
+#define MEMCG_NR_BINS 8
+
+struct lru_gen_memcg {
+ /* the per-node memcg generation counter */
+ unsigned long seq;
+ /* each memcg has one lru_gen_folio per node */
+ unsigned long nr_memcgs[MEMCG_NR_GENS];
+ /* per-node lru_gen_folio list for global reclaim */
+ struct hlist_nulls_head fifo[MEMCG_NR_GENS][MEMCG_NR_BINS];
+ /* protects the above */
+ spinlock_t lock;
+};
+
+void lru_gen_init_pgdat(struct pglist_data *pgdat);
+
void lru_gen_init_memcg(struct mem_cgroup *memcg);
void lru_gen_exit_memcg(struct mem_cgroup *memcg);
-#endif
+void lru_gen_online_memcg(struct mem_cgroup *memcg);
+void lru_gen_offline_memcg(struct mem_cgroup *memcg);
+void lru_gen_release_memcg(struct mem_cgroup *memcg);
+void lru_gen_rotate_memcg(struct lruvec *lruvec, int op);
+
+#else /* !CONFIG_MEMCG */
+
+#define MEMCG_NR_GENS 1
+
+struct lru_gen_memcg {
+};
+
+static inline void lru_gen_init_pgdat(struct pglist_data *pgdat)
+{
+}
+
+#endif /* CONFIG_MEMCG */
#else /* !CONFIG_LRU_GEN */
+static inline void lru_gen_init_pgdat(struct pglist_data *pgdat)
+{
+}
+
static inline void lru_gen_init_lruvec(struct lruvec *lruvec)
{
}
@@ -494,6 +587,7 @@ static inline void lru_gen_look_around(s
}
#ifdef CONFIG_MEMCG
+
static inline void lru_gen_init_memcg(struct mem_cgroup *memcg)
{
}
@@ -501,7 +595,24 @@ static inline void lru_gen_init_memcg(st
static inline void lru_gen_exit_memcg(struct mem_cgroup *memcg)
{
}
-#endif
+
+static inline void lru_gen_online_memcg(struct mem_cgroup *memcg)
+{
+}
+
+static inline void lru_gen_offline_memcg(struct mem_cgroup *memcg)
+{
+}
+
+static inline void lru_gen_release_memcg(struct mem_cgroup *memcg)
+{
+}
+
+static inline void lru_gen_rotate_memcg(struct lruvec *lruvec, int op)
+{
+}
+
+#endif /* CONFIG_MEMCG */
#endif /* CONFIG_LRU_GEN */
@@ -1219,6 +1330,8 @@ typedef struct pglist_data {
#ifdef CONFIG_LRU_GEN
/* kswap mm walk data */
struct lru_gen_mm_walk mm_walk;
+ /* lru_gen_folio list */
+ struct lru_gen_memcg memcg_lru;
#endif
CACHELINE_PADDING(_pad2_);
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -477,6 +477,16 @@ static void mem_cgroup_update_tree(struc
struct mem_cgroup_per_node *mz;
struct mem_cgroup_tree_per_node *mctz;
+ if (lru_gen_enabled()) {
+ struct lruvec *lruvec = &memcg->nodeinfo[nid]->lruvec;
+
+ /* see the comment on MEMCG_NR_GENS */
+ if (soft_limit_excess(memcg) && lru_gen_memcg_seg(lruvec) != MEMCG_LRU_HEAD)
+ lru_gen_rotate_memcg(lruvec, MEMCG_LRU_HEAD);
+
+ return;
+ }
+
mctz = soft_limit_tree.rb_tree_per_node[nid];
if (!mctz)
return;
@@ -3524,6 +3534,9 @@ unsigned long mem_cgroup_soft_limit_recl
struct mem_cgroup_tree_per_node *mctz;
unsigned long excess;
+ if (lru_gen_enabled())
+ return 0;
+
if (order > 0)
return 0;
@@ -5387,6 +5400,7 @@ static int mem_cgroup_css_online(struct
if (unlikely(mem_cgroup_is_root(memcg)))
queue_delayed_work(system_unbound_wq, &stats_flush_dwork,
2UL*HZ);
+ lru_gen_online_memcg(memcg);
return 0;
offline_kmem:
memcg_offline_kmem(memcg);
@@ -5418,6 +5432,7 @@ static void mem_cgroup_css_offline(struc
memcg_offline_kmem(memcg);
reparent_shrinker_deferred(memcg);
wb_memcg_offline(memcg);
+ lru_gen_offline_memcg(memcg);
drain_all_stock(memcg);
@@ -5429,6 +5444,7 @@ static void mem_cgroup_css_released(stru
struct mem_cgroup *memcg = mem_cgroup_from_css(css);
invalidate_reclaim_iterators(memcg);
+ lru_gen_release_memcg(memcg);
}
static void mem_cgroup_css_free(struct cgroup_subsys_state *css)
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7943,6 +7943,7 @@ static void __init free_area_init_node(i
pgdat_set_deferred_range(pgdat);
free_area_init_core(pgdat);
+ lru_gen_init_pgdat(pgdat);
}
static void __init free_area_init_memoryless_node(int nid)
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -54,6 +54,8 @@
#include <linux/shmem_fs.h>
#include <linux/ctype.h>
#include <linux/debugfs.h>
+#include <linux/rculist_nulls.h>
+#include <linux/random.h>
#include <asm/tlbflush.h>
#include <asm/div64.h>
@@ -134,11 +136,6 @@ struct scan_control {
/* Always discard instead of demoting to lower tier memory */
unsigned int no_demotion:1;
-#ifdef CONFIG_LRU_GEN
- /* help kswapd make better choices among multiple memcgs */
- unsigned long last_reclaimed;
-#endif
-
/* Allocation order */
s8 order;
@@ -3160,6 +3157,9 @@ DEFINE_STATIC_KEY_ARRAY_FALSE(lru_gen_ca
for ((type) = 0; (type) < ANON_AND_FILE; (type)++) \
for ((zone) = 0; (zone) < MAX_NR_ZONES; (zone)++)
+#define get_memcg_gen(seq) ((seq) % MEMCG_NR_GENS)
+#define get_memcg_bin(bin) ((bin) % MEMCG_NR_BINS)
+
static struct lruvec *get_lruvec(struct mem_cgroup *memcg, int nid)
{
struct pglist_data *pgdat = NODE_DATA(nid);
@@ -4442,8 +4442,7 @@ done:
if (sc->priority <= DEF_PRIORITY - 2)
wait_event_killable(lruvec->mm_state.wait,
max_seq < READ_ONCE(lrugen->max_seq));
-
- return max_seq < READ_ONCE(lrugen->max_seq);
+ return false;
}
VM_WARN_ON_ONCE(max_seq != READ_ONCE(lrugen->max_seq));
@@ -4516,8 +4515,6 @@ static void lru_gen_age_node(struct pgli
VM_WARN_ON_ONCE(!current_is_kswapd());
- sc->last_reclaimed = sc->nr_reclaimed;
-
/* check the order to exclude compaction-induced reclaim */
if (!min_ttl || sc->order || sc->priority == DEF_PRIORITY)
return;
@@ -5116,8 +5113,7 @@ static bool should_run_aging(struct lruv
* 1. Defer try_to_inc_max_seq() to workqueues to reduce latency for memcg
* reclaim.
*/
-static unsigned long get_nr_to_scan(struct lruvec *lruvec, struct scan_control *sc,
- bool can_swap)
+static long get_nr_to_scan(struct lruvec *lruvec, struct scan_control *sc, bool can_swap)
{
unsigned long nr_to_scan;
struct mem_cgroup *memcg = lruvec_memcg(lruvec);
@@ -5134,10 +5130,8 @@ static unsigned long get_nr_to_scan(stru
if (sc->priority == DEF_PRIORITY)
return nr_to_scan;
- try_to_inc_max_seq(lruvec, max_seq, sc, can_swap, false);
-
/* skip this lruvec as it's low on cold folios */
- return 0;
+ return try_to_inc_max_seq(lruvec, max_seq, sc, can_swap, false) ? -1 : 0;
}
static unsigned long get_nr_to_reclaim(struct scan_control *sc)
@@ -5146,29 +5140,18 @@ static unsigned long get_nr_to_reclaim(s
if (!global_reclaim(sc))
return -1;
- /* discount the previous progress for kswapd */
- if (current_is_kswapd())
- return sc->nr_to_reclaim + sc->last_reclaimed;
-
return max(sc->nr_to_reclaim, compact_gap(sc->order));
}
-static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
+static bool try_to_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
{
- struct blk_plug plug;
+ long nr_to_scan;
unsigned long scanned = 0;
unsigned long nr_to_reclaim = get_nr_to_reclaim(sc);
- lru_add_drain();
-
- blk_start_plug(&plug);
-
- set_mm_walk(lruvec_pgdat(lruvec));
-
while (true) {
int delta;
int swappiness;
- unsigned long nr_to_scan;
if (sc->may_swap)
swappiness = get_swappiness(lruvec, sc);
@@ -5178,7 +5161,7 @@ static void lru_gen_shrink_lruvec(struct
swappiness = 0;
nr_to_scan = get_nr_to_scan(lruvec, sc, swappiness);
- if (!nr_to_scan)
+ if (nr_to_scan <= 0)
break;
delta = evict_folios(lruvec, sc, swappiness);
@@ -5195,10 +5178,251 @@ static void lru_gen_shrink_lruvec(struct
cond_resched();
}
+ /* whether try_to_inc_max_seq() was successful */
+ return nr_to_scan < 0;
+}
+
+static int shrink_one(struct lruvec *lruvec, struct scan_control *sc)
+{
+ bool success;
+ unsigned long scanned = sc->nr_scanned;
+ unsigned long reclaimed = sc->nr_reclaimed;
+ int seg = lru_gen_memcg_seg(lruvec);
+ struct mem_cgroup *memcg = lruvec_memcg(lruvec);
+ struct pglist_data *pgdat = lruvec_pgdat(lruvec);
+
+ /* see the comment on MEMCG_NR_GENS */
+ if (!lruvec_is_sizable(lruvec, sc))
+ return seg != MEMCG_LRU_TAIL ? MEMCG_LRU_TAIL : MEMCG_LRU_YOUNG;
+
+ mem_cgroup_calculate_protection(NULL, memcg);
+
+ if (mem_cgroup_below_min(memcg))
+ return MEMCG_LRU_YOUNG;
+
+ if (mem_cgroup_below_low(memcg)) {
+ /* see the comment on MEMCG_NR_GENS */
+ if (seg != MEMCG_LRU_TAIL)
+ return MEMCG_LRU_TAIL;
+
+ memcg_memory_event(memcg, MEMCG_LOW);
+ }
+
+ success = try_to_shrink_lruvec(lruvec, sc);
+
+ shrink_slab(sc->gfp_mask, pgdat->node_id, memcg, sc->priority);
+
+ if (!sc->proactive)
+ vmpressure(sc->gfp_mask, memcg, false, sc->nr_scanned - scanned,
+ sc->nr_reclaimed - reclaimed);
+
+ sc->nr_reclaimed += current->reclaim_state->reclaimed_slab;
+ current->reclaim_state->reclaimed_slab = 0;
+
+ return success ? MEMCG_LRU_YOUNG : 0;
+}
+
+#ifdef CONFIG_MEMCG
+
+static void shrink_many(struct pglist_data *pgdat, struct scan_control *sc)
+{
+ int gen;
+ int bin;
+ int first_bin;
+ struct lruvec *lruvec;
+ struct lru_gen_folio *lrugen;
+ const struct hlist_nulls_node *pos;
+ int op = 0;
+ struct mem_cgroup *memcg = NULL;
+ unsigned long nr_to_reclaim = get_nr_to_reclaim(sc);
+
+ bin = first_bin = get_random_u32_below(MEMCG_NR_BINS);
+restart:
+ gen = get_memcg_gen(READ_ONCE(pgdat->memcg_lru.seq));
+
+ rcu_read_lock();
+
+ hlist_nulls_for_each_entry_rcu(lrugen, pos, &pgdat->memcg_lru.fifo[gen][bin], list) {
+ if (op)
+ lru_gen_rotate_memcg(lruvec, op);
+
+ mem_cgroup_put(memcg);
+
+ lruvec = container_of(lrugen, struct lruvec, lrugen);
+ memcg = lruvec_memcg(lruvec);
+
+ if (!mem_cgroup_tryget(memcg)) {
+ op = 0;
+ memcg = NULL;
+ continue;
+ }
+
+ rcu_read_unlock();
+
+ op = shrink_one(lruvec, sc);
+
+ if (sc->nr_reclaimed >= nr_to_reclaim)
+ goto success;
+
+ rcu_read_lock();
+ }
+
+ rcu_read_unlock();
+
+ /* restart if raced with lru_gen_rotate_memcg() */
+ if (gen != get_nulls_value(pos))
+ goto restart;
+
+ /* try the rest of the bins of the current generation */
+ bin = get_memcg_bin(bin + 1);
+ if (bin != first_bin)
+ goto restart;
+success:
+ if (op)
+ lru_gen_rotate_memcg(lruvec, op);
+
+ mem_cgroup_put(memcg);
+}
+
+static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
+{
+ struct blk_plug plug;
+
+ VM_WARN_ON_ONCE(global_reclaim(sc));
+
+ lru_add_drain();
+
+ blk_start_plug(&plug);
+
+ set_mm_walk(lruvec_pgdat(lruvec));
+
+ if (try_to_shrink_lruvec(lruvec, sc))
+ lru_gen_rotate_memcg(lruvec, MEMCG_LRU_YOUNG);
+
+ clear_mm_walk();
+
+ blk_finish_plug(&plug);
+}
+
+#else /* !CONFIG_MEMCG */
+
+static void shrink_many(struct pglist_data *pgdat, struct scan_control *sc)
+{
+ BUILD_BUG();
+}
+
+static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
+{
+ BUILD_BUG();
+}
+
+#endif
+
+static void set_initial_priority(struct pglist_data *pgdat, struct scan_control *sc)
+{
+ int priority;
+ unsigned long reclaimable;
+ struct lruvec *lruvec = mem_cgroup_lruvec(NULL, pgdat);
+
+ if (sc->priority != DEF_PRIORITY || sc->nr_to_reclaim < MIN_LRU_BATCH)
+ return;
+ /*
+ * Determine the initial priority based on ((total / MEMCG_NR_GENS) >>
+ * priority) * reclaimed_to_scanned_ratio = nr_to_reclaim, where the
+ * estimated reclaimed_to_scanned_ratio = inactive / total.
+ */
+ reclaimable = node_page_state(pgdat, NR_INACTIVE_FILE);
+ if (get_swappiness(lruvec, sc))
+ reclaimable += node_page_state(pgdat, NR_INACTIVE_ANON);
+
+ reclaimable /= MEMCG_NR_GENS;
+
+ /* round down reclaimable and round up sc->nr_to_reclaim */
+ priority = fls_long(reclaimable) - 1 - fls_long(sc->nr_to_reclaim - 1);
+
+ sc->priority = clamp(priority, 0, DEF_PRIORITY);
+}
+
+static void lru_gen_shrink_node(struct pglist_data *pgdat, struct scan_control *sc)
+{
+ struct blk_plug plug;
+ unsigned long reclaimed = sc->nr_reclaimed;
+
+ VM_WARN_ON_ONCE(!global_reclaim(sc));
+
+ lru_add_drain();
+
+ blk_start_plug(&plug);
+
+ set_mm_walk(pgdat);
+
+ set_initial_priority(pgdat, sc);
+
+ if (current_is_kswapd())
+ sc->nr_reclaimed = 0;
+
+ if (mem_cgroup_disabled())
+ shrink_one(&pgdat->__lruvec, sc);
+ else
+ shrink_many(pgdat, sc);
+
+ if (current_is_kswapd())
+ sc->nr_reclaimed += reclaimed;
+
clear_mm_walk();
blk_finish_plug(&plug);
+
+ /* kswapd should never fail */
+ pgdat->kswapd_failures = 0;
+}
+
+#ifdef CONFIG_MEMCG
+void lru_gen_rotate_memcg(struct lruvec *lruvec, int op)
+{
+ int seg;
+ int old, new;
+ int bin = get_random_u32_below(MEMCG_NR_BINS);
+ struct pglist_data *pgdat = lruvec_pgdat(lruvec);
+
+ spin_lock(&pgdat->memcg_lru.lock);
+
+ VM_WARN_ON_ONCE(hlist_nulls_unhashed(&lruvec->lrugen.list));
+
+ seg = 0;
+ new = old = lruvec->lrugen.gen;
+
+ /* see the comment on MEMCG_NR_GENS */
+ if (op == MEMCG_LRU_HEAD)
+ seg = MEMCG_LRU_HEAD;
+ else if (op == MEMCG_LRU_TAIL)
+ seg = MEMCG_LRU_TAIL;
+ else if (op == MEMCG_LRU_OLD)
+ new = get_memcg_gen(pgdat->memcg_lru.seq);
+ else if (op == MEMCG_LRU_YOUNG)
+ new = get_memcg_gen(pgdat->memcg_lru.seq + 1);
+ else
+ VM_WARN_ON_ONCE(true);
+
+ hlist_nulls_del_rcu(&lruvec->lrugen.list);
+
+ if (op == MEMCG_LRU_HEAD || op == MEMCG_LRU_OLD)
+ hlist_nulls_add_head_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[new][bin]);
+ else
+ hlist_nulls_add_tail_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[new][bin]);
+
+ pgdat->memcg_lru.nr_memcgs[old]--;
+ pgdat->memcg_lru.nr_memcgs[new]++;
+
+ lruvec->lrugen.gen = new;
+ WRITE_ONCE(lruvec->lrugen.seg, seg);
+
+ if (!pgdat->memcg_lru.nr_memcgs[old] && old == get_memcg_gen(pgdat->memcg_lru.seq))
+ WRITE_ONCE(pgdat->memcg_lru.seq, pgdat->memcg_lru.seq + 1);
+
+ spin_unlock(&pgdat->memcg_lru.lock);
}
+#endif
/******************************************************************************
* state change
@@ -5656,11 +5880,11 @@ static int run_cmd(char cmd, int memcg_i
if (!mem_cgroup_disabled()) {
rcu_read_lock();
+
memcg = mem_cgroup_from_id(memcg_id);
-#ifdef CONFIG_MEMCG
- if (memcg && !css_tryget(&memcg->css))
+ if (!mem_cgroup_tryget(memcg))
memcg = NULL;
-#endif
+
rcu_read_unlock();
if (!memcg)
@@ -5808,6 +6032,19 @@ void lru_gen_init_lruvec(struct lruvec *
}
#ifdef CONFIG_MEMCG
+
+void lru_gen_init_pgdat(struct pglist_data *pgdat)
+{
+ int i, j;
+
+ spin_lock_init(&pgdat->memcg_lru.lock);
+
+ for (i = 0; i < MEMCG_NR_GENS; i++) {
+ for (j = 0; j < MEMCG_NR_BINS; j++)
+ INIT_HLIST_NULLS_HEAD(&pgdat->memcg_lru.fifo[i][j], i);
+ }
+}
+
void lru_gen_init_memcg(struct mem_cgroup *memcg)
{
INIT_LIST_HEAD(&memcg->mm_list.fifo);
@@ -5831,7 +6068,69 @@ void lru_gen_exit_memcg(struct mem_cgrou
}
}
}
-#endif
+
+void lru_gen_online_memcg(struct mem_cgroup *memcg)
+{
+ int gen;
+ int nid;
+ int bin = get_random_u32_below(MEMCG_NR_BINS);
+
+ for_each_node(nid) {
+ struct pglist_data *pgdat = NODE_DATA(nid);
+ struct lruvec *lruvec = get_lruvec(memcg, nid);
+
+ spin_lock(&pgdat->memcg_lru.lock);
+
+ VM_WARN_ON_ONCE(!hlist_nulls_unhashed(&lruvec->lrugen.list));
+
+ gen = get_memcg_gen(pgdat->memcg_lru.seq);
+
+ hlist_nulls_add_tail_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[gen][bin]);
+ pgdat->memcg_lru.nr_memcgs[gen]++;
+
+ lruvec->lrugen.gen = gen;
+
+ spin_unlock(&pgdat->memcg_lru.lock);
+ }
+}
+
+void lru_gen_offline_memcg(struct mem_cgroup *memcg)
+{
+ int nid;
+
+ for_each_node(nid) {
+ struct lruvec *lruvec = get_lruvec(memcg, nid);
+
+ lru_gen_rotate_memcg(lruvec, MEMCG_LRU_OLD);
+ }
+}
+
+void lru_gen_release_memcg(struct mem_cgroup *memcg)
+{
+ int gen;
+ int nid;
+
+ for_each_node(nid) {
+ struct pglist_data *pgdat = NODE_DATA(nid);
+ struct lruvec *lruvec = get_lruvec(memcg, nid);
+
+ spin_lock(&pgdat->memcg_lru.lock);
+
+ VM_WARN_ON_ONCE(hlist_nulls_unhashed(&lruvec->lrugen.list));
+
+ gen = lruvec->lrugen.gen;
+
+ hlist_nulls_del_rcu(&lruvec->lrugen.list);
+ pgdat->memcg_lru.nr_memcgs[gen]--;
+
+ if (!pgdat->memcg_lru.nr_memcgs[gen] && gen == get_memcg_gen(pgdat->memcg_lru.seq))
+ WRITE_ONCE(pgdat->memcg_lru.seq, pgdat->memcg_lru.seq + 1);
+
+ spin_unlock(&pgdat->memcg_lru.lock);
+ }
+}
+
+#endif /* CONFIG_MEMCG */
static int __init init_lru_gen(void)
{
@@ -5858,6 +6157,10 @@ static void lru_gen_shrink_lruvec(struct
{
}
+static void lru_gen_shrink_node(struct pglist_data *pgdat, struct scan_control *sc)
+{
+}
+
#endif /* CONFIG_LRU_GEN */
static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
@@ -5871,7 +6174,7 @@ static void shrink_lruvec(struct lruvec
bool proportional_reclaim;
struct blk_plug plug;
- if (lru_gen_enabled()) {
+ if (lru_gen_enabled() && !global_reclaim(sc)) {
lru_gen_shrink_lruvec(lruvec, sc);
return;
}
@@ -6114,6 +6417,11 @@ static void shrink_node(pg_data_t *pgdat
struct lruvec *target_lruvec;
bool reclaimable = false;
+ if (lru_gen_enabled() && global_reclaim(sc)) {
+ lru_gen_shrink_node(pgdat, sc);
+ return;
+ }
+
target_lruvec = mem_cgroup_lruvec(sc->target_mem_cgroup, pgdat);
again:

View File

@ -1,202 +0,0 @@
From 11b14ee8cbbbebd8204609076a9327a1171cd253 Mon Sep 17 00:00:00 2001
From: Yu Zhao <yuzhao@google.com>
Date: Wed, 21 Dec 2022 21:19:05 -0700
Subject: [PATCH 07/19] BACKPORT: mm: multi-gen LRU: clarify scan_control flags
Among the flags in scan_control:
1. sc->may_swap, which indicates swap constraint due to memsw.max, is
supported as usual.
2. sc->proactive, which indicates reclaim by memory.reclaim, may not
opportunistically skip the aging path, since it is considered less
latency sensitive.
3. !(sc->gfp_mask & __GFP_IO), which indicates IO constraint, lowers
swappiness to prioritize file LRU, since clean file folios are more
likely to exist.
4. sc->may_writepage and sc->may_unmap, which indicates opportunistic
reclaim, are rejected, since unmapped clean folios are already
prioritized. Scanning for more of them is likely futile and can
cause high reclaim latency when there is a large number of memcgs.
The rest are handled by the existing code.
Link: https://lkml.kernel.org/r/20221222041905.2431096-8-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael Larabel <Michael@MichaelLarabel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bug: 274865848
(cherry picked from commit e9d4e1ee788097484606c32122f146d802a9c5fb)
[TJ: Resolved conflict with older function signature for min_cgroup_below_min, and over
cdded861182142ac4488a4d64c571107aeb77f53 ("ANDROID: MGLRU: Don't skip anon reclaim if swap low")]
Change-Id: Ic2e779eaf4e91a3921831b4e2fa10c740dc59d50
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
mm/vmscan.c | 55 +++++++++++++++++++++++++++--------------------------
1 file changed, 28 insertions(+), 27 deletions(-)
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3185,6 +3185,9 @@ static int get_swappiness(struct lruvec
struct mem_cgroup *memcg = lruvec_memcg(lruvec);
struct pglist_data *pgdat = lruvec_pgdat(lruvec);
+ if (!sc->may_swap)
+ return 0;
+
if (!can_demote(pgdat->node_id, sc) &&
mem_cgroup_get_nr_swap_pages(memcg) < MIN_LRU_BATCH)
return 0;
@@ -4223,7 +4226,7 @@ static void walk_mm(struct lruvec *lruve
} while (err == -EAGAIN);
}
-static struct lru_gen_mm_walk *set_mm_walk(struct pglist_data *pgdat)
+static struct lru_gen_mm_walk *set_mm_walk(struct pglist_data *pgdat, bool force_alloc)
{
struct lru_gen_mm_walk *walk = current->reclaim_state->mm_walk;
@@ -4231,7 +4234,7 @@ static struct lru_gen_mm_walk *set_mm_wa
VM_WARN_ON_ONCE(walk);
walk = &pgdat->mm_walk;
- } else if (!pgdat && !walk) {
+ } else if (!walk && force_alloc) {
VM_WARN_ON_ONCE(current_is_kswapd());
walk = kzalloc(sizeof(*walk), __GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN);
@@ -4419,7 +4422,7 @@ static bool try_to_inc_max_seq(struct lr
goto done;
}
- walk = set_mm_walk(NULL);
+ walk = set_mm_walk(NULL, true);
if (!walk) {
success = iterate_mm_list_nowalk(lruvec, max_seq);
goto done;
@@ -4488,8 +4491,6 @@ static bool lruvec_is_reclaimable(struct
struct mem_cgroup *memcg = lruvec_memcg(lruvec);
DEFINE_MIN_SEQ(lruvec);
- VM_WARN_ON_ONCE(sc->memcg_low_reclaim);
-
/* see the comment on lru_gen_folio */
gen = lru_gen_from_seq(min_seq[LRU_GEN_FILE]);
birth = READ_ONCE(lruvec->lrugen.timestamps[gen]);
@@ -4753,12 +4754,8 @@ static bool isolate_folio(struct lruvec
{
bool success;
- /* unmapping inhibited */
- if (!sc->may_unmap && folio_mapped(folio))
- return false;
-
/* swapping inhibited */
- if (!(sc->may_writepage && (sc->gfp_mask & __GFP_IO)) &&
+ if (!(sc->gfp_mask & __GFP_IO) &&
(folio_test_dirty(folio) ||
(folio_test_anon(folio) && !folio_test_swapcache(folio))))
return false;
@@ -4857,9 +4854,8 @@ static int scan_folios(struct lruvec *lr
__count_vm_events(PGSCAN_ANON + type, isolated);
/*
- * There might not be eligible pages due to reclaim_idx, may_unmap and
- * may_writepage. Check the remaining to prevent livelock if it's not
- * making progress.
+ * There might not be eligible folios due to reclaim_idx. Check the
+ * remaining to prevent livelock if it's not making progress.
*/
return isolated || !remaining ? scanned : 0;
}
@@ -5119,8 +5115,7 @@ static long get_nr_to_scan(struct lruvec
struct mem_cgroup *memcg = lruvec_memcg(lruvec);
DEFINE_MAX_SEQ(lruvec);
- if (mem_cgroup_below_min(memcg) ||
- (mem_cgroup_below_low(memcg) && !sc->memcg_low_reclaim))
+ if (mem_cgroup_below_min(memcg))
return 0;
if (!should_run_aging(lruvec, max_seq, sc, can_swap, &nr_to_scan))
@@ -5148,17 +5143,14 @@ static bool try_to_shrink_lruvec(struct
long nr_to_scan;
unsigned long scanned = 0;
unsigned long nr_to_reclaim = get_nr_to_reclaim(sc);
+ int swappiness = get_swappiness(lruvec, sc);
+
+ /* clean file folios are more likely to exist */
+ if (swappiness && !(sc->gfp_mask & __GFP_IO))
+ swappiness = 1;
while (true) {
int delta;
- int swappiness;
-
- if (sc->may_swap)
- swappiness = get_swappiness(lruvec, sc);
- else if (!cgroup_reclaim(sc) && get_swappiness(lruvec, sc))
- swappiness = 1;
- else
- swappiness = 0;
nr_to_scan = get_nr_to_scan(lruvec, sc, swappiness);
if (nr_to_scan <= 0)
@@ -5289,12 +5281,13 @@ static void lru_gen_shrink_lruvec(struct
struct blk_plug plug;
VM_WARN_ON_ONCE(global_reclaim(sc));
+ VM_WARN_ON_ONCE(!sc->may_writepage || !sc->may_unmap);
lru_add_drain();
blk_start_plug(&plug);
- set_mm_walk(lruvec_pgdat(lruvec));
+ set_mm_walk(NULL, sc->proactive);
if (try_to_shrink_lruvec(lruvec, sc))
lru_gen_rotate_memcg(lruvec, MEMCG_LRU_YOUNG);
@@ -5350,11 +5343,19 @@ static void lru_gen_shrink_node(struct p
VM_WARN_ON_ONCE(!global_reclaim(sc));
+ /*
+ * Unmapped clean folios are already prioritized. Scanning for more of
+ * them is likely futile and can cause high reclaim latency when there
+ * is a large number of memcgs.
+ */
+ if (!sc->may_writepage || !sc->may_unmap)
+ goto done;
+
lru_add_drain();
blk_start_plug(&plug);
- set_mm_walk(pgdat);
+ set_mm_walk(pgdat, sc->proactive);
set_initial_priority(pgdat, sc);
@@ -5372,7 +5373,7 @@ static void lru_gen_shrink_node(struct p
clear_mm_walk();
blk_finish_plug(&plug);
-
+done:
/* kswapd should never fail */
pgdat->kswapd_failures = 0;
}
@@ -5944,7 +5945,7 @@ static ssize_t lru_gen_seq_write(struct
set_task_reclaim_state(current, &sc.reclaim_state);
flags = memalloc_noreclaim_save();
blk_start_plug(&plug);
- if (!set_mm_walk(NULL)) {
+ if (!set_mm_walk(NULL, true)) {
err = -ENOMEM;
goto done;
}

View File

@ -1,38 +0,0 @@
From 25887d48dff860751a06caa4188bfaf6bfb6e4b2 Mon Sep 17 00:00:00 2001
From: Yu Zhao <yuzhao@google.com>
Date: Wed, 21 Dec 2022 21:19:06 -0700
Subject: [PATCH 08/19] UPSTREAM: mm: multi-gen LRU: simplify
arch_has_hw_pte_young() check
Scanning page tables when hardware does not set the accessed bit has
no real use cases.
Link: https://lkml.kernel.org/r/20221222041905.2431096-9-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael Larabel <Michael@MichaelLarabel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bug: 274865848
(cherry picked from commit f386e9314025ea99dae639ed2032560a92081430)
Change-Id: I84d97ab665b4e3bb862a9bc7d72f50dea7191a6b
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
mm/vmscan.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4417,7 +4417,7 @@ static bool try_to_inc_max_seq(struct lr
* handful of PTEs. Spreading the work out over a period of time usually
* is less efficient, but it avoids bursty page faults.
*/
- if (!force_scan && !(arch_has_hw_pte_young() && get_cap(LRU_GEN_MM_WALK))) {
+ if (!arch_has_hw_pte_young() || !get_cap(LRU_GEN_MM_WALK)) {
success = iterate_mm_list_nowalk(lruvec, max_seq);
goto done;
}

View File

@ -1,92 +0,0 @@
From 620b0ee94455e48d124414cd06d8a53f69fb6453 Mon Sep 17 00:00:00 2001
From: Yu Zhao <yuzhao@google.com>
Date: Mon, 13 Feb 2023 00:53:22 -0700
Subject: [PATCH 09/19] UPSTREAM: mm: multi-gen LRU: avoid futile retries
Recall that the per-node memcg LRU has two generations and they alternate
when the last memcg (of a given node) is moved from one to the other.
Each generation is also sharded into multiple bins to improve scalability.
A reclaimer starts with a random bin (in the old generation) and, if it
fails, it will retry, i.e., to try the rest of the bins.
If a reclaimer fails with the last memcg, it should move this memcg to the
young generation first, which causes the generations to alternate, and
then retry. Otherwise, the retries will be futile because all other bins
are empty.
Link: https://lkml.kernel.org/r/20230213075322.1416966-1-yuzhao@google.com
Fixes: e4dde56cd208 ("mm: multi-gen LRU: per-node lru_gen_folio lists")
Signed-off-by: Yu Zhao <yuzhao@google.com>
Reported-by: T.J. Mercier <tjmercier@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bug: 274865848
(cherry picked from commit 9f550d78b40da21b4da515db4c37d8d7b12aa1a6)
Change-Id: Ie92535676b005ec9e7987632b742fdde8d54436f
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
mm/vmscan.c | 25 +++++++++++++++----------
1 file changed, 15 insertions(+), 10 deletions(-)
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -5218,18 +5218,20 @@ static int shrink_one(struct lruvec *lru
static void shrink_many(struct pglist_data *pgdat, struct scan_control *sc)
{
+ int op;
int gen;
int bin;
int first_bin;
struct lruvec *lruvec;
struct lru_gen_folio *lrugen;
+ struct mem_cgroup *memcg;
const struct hlist_nulls_node *pos;
- int op = 0;
- struct mem_cgroup *memcg = NULL;
unsigned long nr_to_reclaim = get_nr_to_reclaim(sc);
bin = first_bin = get_random_u32_below(MEMCG_NR_BINS);
restart:
+ op = 0;
+ memcg = NULL;
gen = get_memcg_gen(READ_ONCE(pgdat->memcg_lru.seq));
rcu_read_lock();
@@ -5253,14 +5255,22 @@ restart:
op = shrink_one(lruvec, sc);
- if (sc->nr_reclaimed >= nr_to_reclaim)
- goto success;
-
rcu_read_lock();
+
+ if (sc->nr_reclaimed >= nr_to_reclaim)
+ break;
}
rcu_read_unlock();
+ if (op)
+ lru_gen_rotate_memcg(lruvec, op);
+
+ mem_cgroup_put(memcg);
+
+ if (sc->nr_reclaimed >= nr_to_reclaim)
+ return;
+
/* restart if raced with lru_gen_rotate_memcg() */
if (gen != get_nulls_value(pos))
goto restart;
@@ -5269,11 +5279,6 @@ restart:
bin = get_memcg_bin(bin + 1);
if (bin != first_bin)
goto restart;
-success:
- if (op)
- lru_gen_rotate_memcg(lruvec, op);
-
- mem_cgroup_put(memcg);
}
static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)

View File

@ -1,191 +0,0 @@
From 70d216c71ff5c5b17dd1da6294f97b91fb6aba7a Mon Sep 17 00:00:00 2001
From: Yu Zhao <yuzhao@google.com>
Date: Fri, 30 Dec 2022 14:52:51 -0700
Subject: [PATCH 10/19] UPSTREAM: mm: add vma_has_recency()
Add vma_has_recency() to indicate whether a VMA may exhibit temporal
locality that the LRU algorithm relies on.
This function returns false for VMAs marked by VM_SEQ_READ or
VM_RAND_READ. While the former flag indicates linear access, i.e., a
special case of spatial locality, both flags indicate a lack of temporal
locality, i.e., the reuse of an area within a relatively small duration.
"Recency" is chosen over "locality" to avoid confusion between temporal
and spatial localities.
Before this patch, the active/inactive LRU only ignored the accessed bit
from VMAs marked by VM_SEQ_READ. After this patch, the active/inactive
LRU and MGLRU share the same logic: they both ignore the accessed bit if
vma_has_recency() returns false.
For the active/inactive LRU, the following fio test showed a [6, 8]%
increase in IOPS when randomly accessing mapped files under memory
pressure.
kb=$(awk '/MemTotal/ { print $2 }' /proc/meminfo)
kb=$((kb - 8*1024*1024))
modprobe brd rd_nr=1 rd_size=$kb
dd if=/dev/zero of=/dev/ram0 bs=1M
mkfs.ext4 /dev/ram0
mount /dev/ram0 /mnt/
swapoff -a
fio --name=test --directory=/mnt/ --ioengine=mmap --numjobs=8 \
--size=8G --rw=randrw --time_based --runtime=10m \
--group_reporting
The discussion that led to this patch is here [1]. Additional test
results are available in that thread.
[1] https://lore.kernel.org/r/Y31s%2FK8T85jh05wH@google.com/
Link: https://lkml.kernel.org/r/20221230215252.2628425-1-yuzhao@google.com
Change-Id: I291dcb795197659e40e46539cd32b857677c34ad
Signed-off-by: Yu Zhao <yuzhao@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Righi <andrea.righi@canonical.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michael Larabel <Michael@MichaelLarabel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 8788f6781486769d9598dcaedc3fe0eb12fc3e59)
Bug: 274865848
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
include/linux/mm_inline.h | 8 ++++++++
mm/memory.c | 7 +++----
mm/rmap.c | 42 +++++++++++++++++----------------------
mm/vmscan.c | 5 ++++-
4 files changed, 33 insertions(+), 29 deletions(-)
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -600,4 +600,12 @@ pte_install_uffd_wp_if_needed(struct vm_
#endif
}
+static inline bool vma_has_recency(struct vm_area_struct *vma)
+{
+ if (vma->vm_flags & (VM_SEQ_READ | VM_RAND_READ))
+ return false;
+
+ return true;
+}
+
#endif
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1445,8 +1445,7 @@ again:
force_flush = 1;
set_page_dirty(page);
}
- if (pte_young(ptent) &&
- likely(!(vma->vm_flags & VM_SEQ_READ)))
+ if (pte_young(ptent) && likely(vma_has_recency(vma)))
mark_page_accessed(page);
}
rss[mm_counter(page)]--;
@@ -5219,8 +5218,8 @@ static inline void mm_account_fault(stru
#ifdef CONFIG_LRU_GEN
static void lru_gen_enter_fault(struct vm_area_struct *vma)
{
- /* the LRU algorithm doesn't apply to sequential or random reads */
- current->in_lru_fault = !(vma->vm_flags & (VM_SEQ_READ | VM_RAND_READ));
+ /* the LRU algorithm only applies to accesses with recency */
+ current->in_lru_fault = vma_has_recency(vma);
}
static void lru_gen_exit_fault(void)
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -823,25 +823,14 @@ static bool folio_referenced_one(struct
}
if (pvmw.pte) {
- if (lru_gen_enabled() && pte_young(*pvmw.pte) &&
- !(vma->vm_flags & (VM_SEQ_READ | VM_RAND_READ))) {
+ if (lru_gen_enabled() && pte_young(*pvmw.pte)) {
lru_gen_look_around(&pvmw);
referenced++;
}
if (ptep_clear_flush_young_notify(vma, address,
- pvmw.pte)) {
- /*
- * Don't treat a reference through
- * a sequentially read mapping as such.
- * If the folio has been used in another mapping,
- * we will catch it; if this other mapping is
- * already gone, the unmap path will have set
- * the referenced flag or activated the folio.
- */
- if (likely(!(vma->vm_flags & VM_SEQ_READ)))
- referenced++;
- }
+ pvmw.pte))
+ referenced++;
} else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
if (pmdp_clear_flush_young_notify(vma, address,
pvmw.pmd))
@@ -875,7 +864,20 @@ static bool invalid_folio_referenced_vma
struct folio_referenced_arg *pra = arg;
struct mem_cgroup *memcg = pra->memcg;
- if (!mm_match_cgroup(vma->vm_mm, memcg))
+ /*
+ * Ignore references from this mapping if it has no recency. If the
+ * folio has been used in another mapping, we will catch it; if this
+ * other mapping is already gone, the unmap path will have set the
+ * referenced flag or activated the folio in zap_pte_range().
+ */
+ if (!vma_has_recency(vma))
+ return true;
+
+ /*
+ * If we are reclaiming on behalf of a cgroup, skip counting on behalf
+ * of references from different cgroups.
+ */
+ if (memcg && !mm_match_cgroup(vma->vm_mm, memcg))
return true;
return false;
@@ -906,6 +908,7 @@ int folio_referenced(struct folio *folio
.arg = (void *)&pra,
.anon_lock = folio_lock_anon_vma_read,
.try_lock = true,
+ .invalid_vma = invalid_folio_referenced_vma,
};
*vm_flags = 0;
@@ -921,15 +924,6 @@ int folio_referenced(struct folio *folio
return 1;
}
- /*
- * If we are reclaiming on behalf of a cgroup, skip
- * counting on behalf of references from different
- * cgroups
- */
- if (memcg) {
- rwc.invalid_vma = invalid_folio_referenced_vma;
- }
-
rmap_walk(folio, &rwc);
*vm_flags = pra.vm_flags;
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3778,7 +3778,10 @@ static int should_skip_vma(unsigned long
if (is_vm_hugetlb_page(vma))
return true;
- if (vma->vm_flags & (VM_LOCKED | VM_SPECIAL | VM_SEQ_READ | VM_RAND_READ))
+ if (!vma_has_recency(vma))
+ return true;
+
+ if (vma->vm_flags & (VM_LOCKED | VM_SPECIAL))
return true;
if (vma == get_gate_vma(vma->vm_mm))

View File

@ -1,129 +0,0 @@
From 9ca4e437a24dfc4ec6c362f319eb9850b9eca497 Mon Sep 17 00:00:00 2001
From: Yu Zhao <yuzhao@google.com>
Date: Fri, 30 Dec 2022 14:52:52 -0700
Subject: [PATCH 11/19] UPSTREAM: mm: support POSIX_FADV_NOREUSE
This patch adds POSIX_FADV_NOREUSE to vma_has_recency() so that the LRU
algorithm can ignore access to mapped files marked by this flag.
The advantages of POSIX_FADV_NOREUSE are:
1. Unlike MADV_SEQUENTIAL and MADV_RANDOM, it does not alter the
default readahead behavior.
2. Unlike MADV_SEQUENTIAL and MADV_RANDOM, it does not split VMAs and
therefore does not take mmap_lock.
3. Unlike MADV_COLD, setting it has a negligible cost, regardless of
how many pages it affects.
Its limitations are:
1. Like POSIX_FADV_RANDOM and POSIX_FADV_SEQUENTIAL, it currently does
not support range. IOW, its scope is the entire file.
2. It currently does not ignore access through file descriptors.
Specifically, for the active/inactive LRU, given a file page shared
by two users and one of them having set POSIX_FADV_NOREUSE on the
file, this page will be activated upon the second user accessing
it. This corner case can be covered by checking POSIX_FADV_NOREUSE
before calling folio_mark_accessed() on the read path. But it is
considered not worth the effort.
There have been a few attempts to support POSIX_FADV_NOREUSE, e.g., [1].
This time the goal is to fill a niche: a few desktop applications, e.g.,
large file transferring and video encoding/decoding, want fast file
streaming with mmap() rather than direct IO. Among those applications, an
SVT-AV1 regression was reported when running with MGLRU [2]. The
following test can reproduce that regression.
kb=$(awk '/MemTotal/ { print $2 }' /proc/meminfo)
kb=$((kb - 8*1024*1024))
modprobe brd rd_nr=1 rd_size=$kb
dd if=/dev/zero of=/dev/ram0 bs=1M
mkfs.ext4 /dev/ram0
mount /dev/ram0 /mnt/
swapoff -a
fallocate -l 8G /mnt/swapfile
mkswap /mnt/swapfile
swapon /mnt/swapfile
wget http://ultravideo.cs.tut.fi/video/Bosphorus_3840x2160_120fps_420_8bit_YUV_Y4M.7z
7z e -o/mnt/ Bosphorus_3840x2160_120fps_420_8bit_YUV_Y4M.7z
SvtAv1EncApp --preset 12 -w 3840 -h 2160 \
-i /mnt/Bosphorus_3840x2160.y4m
For MGLRU, the following change showed a [9-11]% increase in FPS,
which makes it on par with the active/inactive LRU.
patch Source/App/EncApp/EbAppMain.c <<EOF
31a32
> #include <fcntl.h>
35d35
< #include <fcntl.h> /* _O_BINARY */
117a118
> posix_fadvise(config->mmap.fd, 0, 0, POSIX_FADV_NOREUSE);
EOF
[1] https://lore.kernel.org/r/1308923350-7932-1-git-send-email-andrea@betterlinux.com/
[2] https://openbenchmarking.org/result/2209259-PTS-MGLRU8GB57
Link: https://lkml.kernel.org/r/20221230215252.2628425-2-yuzhao@google.com
Change-Id: I0b7f5f971d78014ea1ba44cee6a8ec902a4330d0
Signed-off-by: Yu Zhao <yuzhao@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Righi <andrea.righi@canonical.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michael Larabel <Michael@MichaelLarabel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 17e810229cb3068b692fa078bd9b3a6527e0866a)
Bug: 274865848
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
include/linux/fs.h | 2 ++
include/linux/mm_inline.h | 3 +++
mm/fadvise.c | 5 ++++-
3 files changed, 9 insertions(+), 1 deletion(-)
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -166,6 +166,8 @@ typedef int (dio_iodone_t)(struct kiocb
/* File supports DIRECT IO */
#define FMODE_CAN_ODIRECT ((__force fmode_t)0x400000)
+#define FMODE_NOREUSE ((__force fmode_t)0x800000)
+
/* File was opened by fanotify and shouldn't generate fanotify events */
#define FMODE_NONOTIFY ((__force fmode_t)0x4000000)
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -605,6 +605,9 @@ static inline bool vma_has_recency(struc
if (vma->vm_flags & (VM_SEQ_READ | VM_RAND_READ))
return false;
+ if (vma->vm_file && (vma->vm_file->f_mode & FMODE_NOREUSE))
+ return false;
+
return true;
}
--- a/mm/fadvise.c
+++ b/mm/fadvise.c
@@ -80,7 +80,7 @@ int generic_fadvise(struct file *file, l
case POSIX_FADV_NORMAL:
file->f_ra.ra_pages = bdi->ra_pages;
spin_lock(&file->f_lock);
- file->f_mode &= ~FMODE_RANDOM;
+ file->f_mode &= ~(FMODE_RANDOM | FMODE_NOREUSE);
spin_unlock(&file->f_lock);
break;
case POSIX_FADV_RANDOM:
@@ -107,6 +107,9 @@ int generic_fadvise(struct file *file, l
force_page_cache_readahead(mapping, file, start_index, nrpages);
break;
case POSIX_FADV_NOREUSE:
+ spin_lock(&file->f_lock);
+ file->f_mode |= FMODE_NOREUSE;
+ spin_unlock(&file->f_lock);
break;
case POSIX_FADV_DONTNEED:
__filemap_fdatawrite_range(mapping, offset, endbyte,

View File

@ -1,67 +0,0 @@
From 1b5e4c317d80f4826eceb3781702d18d06b14394 Mon Sep 17 00:00:00 2001
From: "T.J. Alumbaugh" <talumbau@google.com>
Date: Wed, 18 Jan 2023 00:18:21 +0000
Subject: [PATCH 12/19] UPSTREAM: mm: multi-gen LRU: section for working set
protection
Patch series "mm: multi-gen LRU: improve".
This patch series improves a few MGLRU functions, collects related
functions, and adds additional documentation.
This patch (of 7):
Add a section for working set protection in the code and the design doc.
The admin doc already contains its usage.
Link: https://lkml.kernel.org/r/20230118001827.1040870-1-talumbau@google.com
Link: https://lkml.kernel.org/r/20230118001827.1040870-2-talumbau@google.com
Change-Id: I65599075fd42951db7739a2ab7cee78516e157b3
Signed-off-by: T.J. Alumbaugh <talumbau@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 7b8144e63d84716f16a1b929e0c7e03ae5c4d5c1)
Bug: 274865848
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
Documentation/mm/multigen_lru.rst | 15 +++++++++++++++
mm/vmscan.c | 4 ++++
2 files changed, 19 insertions(+)
--- a/Documentation/mm/multigen_lru.rst
+++ b/Documentation/mm/multigen_lru.rst
@@ -141,6 +141,21 @@ loop has detected outlying refaults from
this end, the feedback loop uses the first tier as the baseline, for
the reason stated earlier.
+Working set protection
+----------------------
+Each generation is timestamped at birth. If ``lru_gen_min_ttl`` is
+set, an ``lruvec`` is protected from the eviction when its oldest
+generation was born within ``lru_gen_min_ttl`` milliseconds. In other
+words, it prevents the working set of ``lru_gen_min_ttl`` milliseconds
+from getting evicted. The OOM killer is triggered if this working set
+cannot be kept in memory.
+
+This time-based approach has the following advantages:
+
+1. It is easier to configure because it is agnostic to applications
+ and memory sizes.
+2. It is more reliable because it is directly wired to the OOM killer.
+
Summary
-------
The multi-gen LRU can be disassembled into the following parts:
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4461,6 +4461,10 @@ done:
return true;
}
+/******************************************************************************
+ * working set protection
+ ******************************************************************************/
+
static bool lruvec_is_sizable(struct lruvec *lruvec, struct scan_control *sc)
{
int gen, type, zone;

View File

@ -1,57 +0,0 @@
From 5ddf9d53d375e42af49b744bd7c2f8247c6bce15 Mon Sep 17 00:00:00 2001
From: "T.J. Alumbaugh" <talumbau@google.com>
Date: Wed, 18 Jan 2023 00:18:22 +0000
Subject: [PATCH 13/19] UPSTREAM: mm: multi-gen LRU: section for rmap/PT walk
feedback
Add a section for lru_gen_look_around() in the code and the design doc.
Link: https://lkml.kernel.org/r/20230118001827.1040870-3-talumbau@google.com
Change-Id: I5097af63f61b3b69ec2abee6cdbdc33c296df213
Signed-off-by: T.J. Alumbaugh <talumbau@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit db19a43d9b3a8876552f00f656008206ef9a5efa)
Bug: 274865848
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
Documentation/mm/multigen_lru.rst | 14 ++++++++++++++
mm/vmscan.c | 4 ++++
2 files changed, 18 insertions(+)
--- a/Documentation/mm/multigen_lru.rst
+++ b/Documentation/mm/multigen_lru.rst
@@ -156,6 +156,20 @@ This time-based approach has the followi
and memory sizes.
2. It is more reliable because it is directly wired to the OOM killer.
+Rmap/PT walk feedback
+---------------------
+Searching the rmap for PTEs mapping each page on an LRU list (to test
+and clear the accessed bit) can be expensive because pages from
+different VMAs (PA space) are not cache friendly to the rmap (VA
+space). For workloads mostly using mapped pages, searching the rmap
+can incur the highest CPU cost in the reclaim path.
+
+``lru_gen_look_around()`` exploits spatial locality to reduce the
+trips into the rmap. It scans the adjacent PTEs of a young PTE and
+promotes hot pages. If the scan was done cacheline efficiently, it
+adds the PMD entry pointing to the PTE table to the Bloom filter. This
+forms a feedback loop between the eviction and the aging.
+
Summary
-------
The multi-gen LRU can be disassembled into the following parts:
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4555,6 +4555,10 @@ static void lru_gen_age_node(struct pgli
}
}
+/******************************************************************************
+ * rmap/PT walk feedback
+ ******************************************************************************/
+
/*
* This function exploits spatial locality when shrink_folio_list() walks the
* rmap. It scans the adjacent PTEs of a young PTE and promotes hot pages. If

View File

@ -1,243 +0,0 @@
From 397624e12244ec038f51cb1f178ccb7a2ec562e5 Mon Sep 17 00:00:00 2001
From: "T.J. Alumbaugh" <talumbau@google.com>
Date: Wed, 18 Jan 2023 00:18:23 +0000
Subject: [PATCH 14/19] UPSTREAM: mm: multi-gen LRU: section for Bloom filters
Move Bloom filters code into a dedicated section. Improve the design doc
to explain Bloom filter usage and connection between aging and eviction in
their use.
Link: https://lkml.kernel.org/r/20230118001827.1040870-4-talumbau@google.com
Change-Id: I73e866f687c1ed9f5c8538086aa39408b79897db
Signed-off-by: T.J. Alumbaugh <talumbau@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit ccbbbb85945d8f0255aa9dbc1b617017e2294f2c)
Bug: 274865848
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
Documentation/mm/multigen_lru.rst | 16 +++
mm/vmscan.c | 180 +++++++++++++++---------------
2 files changed, 108 insertions(+), 88 deletions(-)
--- a/Documentation/mm/multigen_lru.rst
+++ b/Documentation/mm/multigen_lru.rst
@@ -170,6 +170,22 @@ promotes hot pages. If the scan was done
adds the PMD entry pointing to the PTE table to the Bloom filter. This
forms a feedback loop between the eviction and the aging.
+Bloom Filters
+-------------
+Bloom filters are a space and memory efficient data structure for set
+membership test, i.e., test if an element is not in the set or may be
+in the set.
+
+In the eviction path, specifically, in ``lru_gen_look_around()``, if a
+PMD has a sufficient number of hot pages, its address is placed in the
+filter. In the aging path, set membership means that the PTE range
+will be scanned for young pages.
+
+Note that Bloom filters are probabilistic on set membership. If a test
+is false positive, the cost is an additional scan of a range of PTEs,
+which may yield hot pages anyway. Parameters of the filter itself can
+control the false positive rate in the limit.
+
Summary
-------
The multi-gen LRU can be disassembled into the following parts:
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3209,6 +3209,98 @@ static bool __maybe_unused seq_is_valid(
}
/******************************************************************************
+ * Bloom filters
+ ******************************************************************************/
+
+/*
+ * Bloom filters with m=1<<15, k=2 and the false positive rates of ~1/5 when
+ * n=10,000 and ~1/2 when n=20,000, where, conventionally, m is the number of
+ * bits in a bitmap, k is the number of hash functions and n is the number of
+ * inserted items.
+ *
+ * Page table walkers use one of the two filters to reduce their search space.
+ * To get rid of non-leaf entries that no longer have enough leaf entries, the
+ * aging uses the double-buffering technique to flip to the other filter each
+ * time it produces a new generation. For non-leaf entries that have enough
+ * leaf entries, the aging carries them over to the next generation in
+ * walk_pmd_range(); the eviction also report them when walking the rmap
+ * in lru_gen_look_around().
+ *
+ * For future optimizations:
+ * 1. It's not necessary to keep both filters all the time. The spare one can be
+ * freed after the RCU grace period and reallocated if needed again.
+ * 2. And when reallocating, it's worth scaling its size according to the number
+ * of inserted entries in the other filter, to reduce the memory overhead on
+ * small systems and false positives on large systems.
+ * 3. Jenkins' hash function is an alternative to Knuth's.
+ */
+#define BLOOM_FILTER_SHIFT 15
+
+static inline int filter_gen_from_seq(unsigned long seq)
+{
+ return seq % NR_BLOOM_FILTERS;
+}
+
+static void get_item_key(void *item, int *key)
+{
+ u32 hash = hash_ptr(item, BLOOM_FILTER_SHIFT * 2);
+
+ BUILD_BUG_ON(BLOOM_FILTER_SHIFT * 2 > BITS_PER_TYPE(u32));
+
+ key[0] = hash & (BIT(BLOOM_FILTER_SHIFT) - 1);
+ key[1] = hash >> BLOOM_FILTER_SHIFT;
+}
+
+static bool test_bloom_filter(struct lruvec *lruvec, unsigned long seq, void *item)
+{
+ int key[2];
+ unsigned long *filter;
+ int gen = filter_gen_from_seq(seq);
+
+ filter = READ_ONCE(lruvec->mm_state.filters[gen]);
+ if (!filter)
+ return true;
+
+ get_item_key(item, key);
+
+ return test_bit(key[0], filter) && test_bit(key[1], filter);
+}
+
+static void update_bloom_filter(struct lruvec *lruvec, unsigned long seq, void *item)
+{
+ int key[2];
+ unsigned long *filter;
+ int gen = filter_gen_from_seq(seq);
+
+ filter = READ_ONCE(lruvec->mm_state.filters[gen]);
+ if (!filter)
+ return;
+
+ get_item_key(item, key);
+
+ if (!test_bit(key[0], filter))
+ set_bit(key[0], filter);
+ if (!test_bit(key[1], filter))
+ set_bit(key[1], filter);
+}
+
+static void reset_bloom_filter(struct lruvec *lruvec, unsigned long seq)
+{
+ unsigned long *filter;
+ int gen = filter_gen_from_seq(seq);
+
+ filter = lruvec->mm_state.filters[gen];
+ if (filter) {
+ bitmap_clear(filter, 0, BIT(BLOOM_FILTER_SHIFT));
+ return;
+ }
+
+ filter = bitmap_zalloc(BIT(BLOOM_FILTER_SHIFT),
+ __GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN);
+ WRITE_ONCE(lruvec->mm_state.filters[gen], filter);
+}
+
+/******************************************************************************
* mm_struct list
******************************************************************************/
@@ -3333,94 +3425,6 @@ void lru_gen_migrate_mm(struct mm_struct
}
#endif
-/*
- * Bloom filters with m=1<<15, k=2 and the false positive rates of ~1/5 when
- * n=10,000 and ~1/2 when n=20,000, where, conventionally, m is the number of
- * bits in a bitmap, k is the number of hash functions and n is the number of
- * inserted items.
- *
- * Page table walkers use one of the two filters to reduce their search space.
- * To get rid of non-leaf entries that no longer have enough leaf entries, the
- * aging uses the double-buffering technique to flip to the other filter each
- * time it produces a new generation. For non-leaf entries that have enough
- * leaf entries, the aging carries them over to the next generation in
- * walk_pmd_range(); the eviction also report them when walking the rmap
- * in lru_gen_look_around().
- *
- * For future optimizations:
- * 1. It's not necessary to keep both filters all the time. The spare one can be
- * freed after the RCU grace period and reallocated if needed again.
- * 2. And when reallocating, it's worth scaling its size according to the number
- * of inserted entries in the other filter, to reduce the memory overhead on
- * small systems and false positives on large systems.
- * 3. Jenkins' hash function is an alternative to Knuth's.
- */
-#define BLOOM_FILTER_SHIFT 15
-
-static inline int filter_gen_from_seq(unsigned long seq)
-{
- return seq % NR_BLOOM_FILTERS;
-}
-
-static void get_item_key(void *item, int *key)
-{
- u32 hash = hash_ptr(item, BLOOM_FILTER_SHIFT * 2);
-
- BUILD_BUG_ON(BLOOM_FILTER_SHIFT * 2 > BITS_PER_TYPE(u32));
-
- key[0] = hash & (BIT(BLOOM_FILTER_SHIFT) - 1);
- key[1] = hash >> BLOOM_FILTER_SHIFT;
-}
-
-static void reset_bloom_filter(struct lruvec *lruvec, unsigned long seq)
-{
- unsigned long *filter;
- int gen = filter_gen_from_seq(seq);
-
- filter = lruvec->mm_state.filters[gen];
- if (filter) {
- bitmap_clear(filter, 0, BIT(BLOOM_FILTER_SHIFT));
- return;
- }
-
- filter = bitmap_zalloc(BIT(BLOOM_FILTER_SHIFT),
- __GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN);
- WRITE_ONCE(lruvec->mm_state.filters[gen], filter);
-}
-
-static void update_bloom_filter(struct lruvec *lruvec, unsigned long seq, void *item)
-{
- int key[2];
- unsigned long *filter;
- int gen = filter_gen_from_seq(seq);
-
- filter = READ_ONCE(lruvec->mm_state.filters[gen]);
- if (!filter)
- return;
-
- get_item_key(item, key);
-
- if (!test_bit(key[0], filter))
- set_bit(key[0], filter);
- if (!test_bit(key[1], filter))
- set_bit(key[1], filter);
-}
-
-static bool test_bloom_filter(struct lruvec *lruvec, unsigned long seq, void *item)
-{
- int key[2];
- unsigned long *filter;
- int gen = filter_gen_from_seq(seq);
-
- filter = READ_ONCE(lruvec->mm_state.filters[gen]);
- if (!filter)
- return true;
-
- get_item_key(item, key);
-
- return test_bit(key[0], filter) && test_bit(key[1], filter);
-}
-
static void reset_mm_stats(struct lruvec *lruvec, struct lru_gen_mm_walk *walk, bool last)
{
int i;

View File

@ -1,427 +0,0 @@
From 48c916b812652f9453be5bd45a703728926d41ca Mon Sep 17 00:00:00 2001
From: "T.J. Alumbaugh" <talumbau@google.com>
Date: Wed, 18 Jan 2023 00:18:24 +0000
Subject: [PATCH 15/19] UPSTREAM: mm: multi-gen LRU: section for memcg LRU
Move memcg LRU code into a dedicated section. Improve the design doc to
outline its architecture.
Link: https://lkml.kernel.org/r/20230118001827.1040870-5-talumbau@google.com
Change-Id: Id252e420cff7a858acb098cf2b3642da5c40f602
Signed-off-by: T.J. Alumbaugh <talumbau@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 36c7b4db7c942ae9e1b111f0c6b468c8b2e33842)
Bug: 274865848
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
Documentation/mm/multigen_lru.rst | 33 +++-
include/linux/mm_inline.h | 17 --
include/linux/mmzone.h | 13 +-
mm/memcontrol.c | 8 +-
mm/vmscan.c | 250 +++++++++++++++++-------------
5 files changed, 178 insertions(+), 143 deletions(-)
--- a/Documentation/mm/multigen_lru.rst
+++ b/Documentation/mm/multigen_lru.rst
@@ -186,9 +186,40 @@ is false positive, the cost is an additi
which may yield hot pages anyway. Parameters of the filter itself can
control the false positive rate in the limit.
+Memcg LRU
+---------
+An memcg LRU is a per-node LRU of memcgs. It is also an LRU of LRUs,
+since each node and memcg combination has an LRU of folios (see
+``mem_cgroup_lruvec()``). Its goal is to improve the scalability of
+global reclaim, which is critical to system-wide memory overcommit in
+data centers. Note that memcg LRU only applies to global reclaim.
+
+The basic structure of an memcg LRU can be understood by an analogy to
+the active/inactive LRU (of folios):
+
+1. It has the young and the old (generations), i.e., the counterparts
+ to the active and the inactive;
+2. The increment of ``max_seq`` triggers promotion, i.e., the
+ counterpart to activation;
+3. Other events trigger similar operations, e.g., offlining an memcg
+ triggers demotion, i.e., the counterpart to deactivation.
+
+In terms of global reclaim, it has two distinct features:
+
+1. Sharding, which allows each thread to start at a random memcg (in
+ the old generation) and improves parallelism;
+2. Eventual fairness, which allows direct reclaim to bail out at will
+ and reduces latency without affecting fairness over some time.
+
+In terms of traversing memcgs during global reclaim, it improves the
+best-case complexity from O(n) to O(1) and does not affect the
+worst-case complexity O(n). Therefore, on average, it has a sublinear
+complexity.
+
Summary
-------
-The multi-gen LRU can be disassembled into the following parts:
+The multi-gen LRU (of folios) can be disassembled into the following
+parts:
* Generations
* Rmap walks
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -122,18 +122,6 @@ static inline bool lru_gen_in_fault(void
return current->in_lru_fault;
}
-#ifdef CONFIG_MEMCG
-static inline int lru_gen_memcg_seg(struct lruvec *lruvec)
-{
- return READ_ONCE(lruvec->lrugen.seg);
-}
-#else
-static inline int lru_gen_memcg_seg(struct lruvec *lruvec)
-{
- return 0;
-}
-#endif
-
static inline int lru_gen_from_seq(unsigned long seq)
{
return seq % MAX_NR_GENS;
@@ -314,11 +302,6 @@ static inline bool lru_gen_in_fault(void
return false;
}
-static inline int lru_gen_memcg_seg(struct lruvec *lruvec)
-{
- return 0;
-}
-
static inline bool lru_gen_add_folio(struct lruvec *lruvec, struct folio *folio, bool reclaiming)
{
return false;
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -368,15 +368,6 @@ struct page_vma_mapped_walk;
#define LRU_GEN_MASK ((BIT(LRU_GEN_WIDTH) - 1) << LRU_GEN_PGOFF)
#define LRU_REFS_MASK ((BIT(LRU_REFS_WIDTH) - 1) << LRU_REFS_PGOFF)
-/* see the comment on MEMCG_NR_GENS */
-enum {
- MEMCG_LRU_NOP,
- MEMCG_LRU_HEAD,
- MEMCG_LRU_TAIL,
- MEMCG_LRU_OLD,
- MEMCG_LRU_YOUNG,
-};
-
#ifdef CONFIG_LRU_GEN
enum {
@@ -557,7 +548,7 @@ void lru_gen_exit_memcg(struct mem_cgrou
void lru_gen_online_memcg(struct mem_cgroup *memcg);
void lru_gen_offline_memcg(struct mem_cgroup *memcg);
void lru_gen_release_memcg(struct mem_cgroup *memcg);
-void lru_gen_rotate_memcg(struct lruvec *lruvec, int op);
+void lru_gen_soft_reclaim(struct lruvec *lruvec);
#else /* !CONFIG_MEMCG */
@@ -608,7 +599,7 @@ static inline void lru_gen_release_memcg
{
}
-static inline void lru_gen_rotate_memcg(struct lruvec *lruvec, int op)
+static inline void lru_gen_soft_reclaim(struct lruvec *lruvec)
{
}
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -478,12 +478,8 @@ static void mem_cgroup_update_tree(struc
struct mem_cgroup_tree_per_node *mctz;
if (lru_gen_enabled()) {
- struct lruvec *lruvec = &memcg->nodeinfo[nid]->lruvec;
-
- /* see the comment on MEMCG_NR_GENS */
- if (soft_limit_excess(memcg) && lru_gen_memcg_seg(lruvec) != MEMCG_LRU_HEAD)
- lru_gen_rotate_memcg(lruvec, MEMCG_LRU_HEAD);
-
+ if (soft_limit_excess(memcg))
+ lru_gen_soft_reclaim(&memcg->nodeinfo[nid]->lruvec);
return;
}
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4692,6 +4692,148 @@ void lru_gen_look_around(struct page_vma
}
/******************************************************************************
+ * memcg LRU
+ ******************************************************************************/
+
+/* see the comment on MEMCG_NR_GENS */
+enum {
+ MEMCG_LRU_NOP,
+ MEMCG_LRU_HEAD,
+ MEMCG_LRU_TAIL,
+ MEMCG_LRU_OLD,
+ MEMCG_LRU_YOUNG,
+};
+
+#ifdef CONFIG_MEMCG
+
+static int lru_gen_memcg_seg(struct lruvec *lruvec)
+{
+ return READ_ONCE(lruvec->lrugen.seg);
+}
+
+static void lru_gen_rotate_memcg(struct lruvec *lruvec, int op)
+{
+ int seg;
+ int old, new;
+ int bin = get_random_u32_below(MEMCG_NR_BINS);
+ struct pglist_data *pgdat = lruvec_pgdat(lruvec);
+
+ spin_lock(&pgdat->memcg_lru.lock);
+
+ VM_WARN_ON_ONCE(hlist_nulls_unhashed(&lruvec->lrugen.list));
+
+ seg = 0;
+ new = old = lruvec->lrugen.gen;
+
+ /* see the comment on MEMCG_NR_GENS */
+ if (op == MEMCG_LRU_HEAD)
+ seg = MEMCG_LRU_HEAD;
+ else if (op == MEMCG_LRU_TAIL)
+ seg = MEMCG_LRU_TAIL;
+ else if (op == MEMCG_LRU_OLD)
+ new = get_memcg_gen(pgdat->memcg_lru.seq);
+ else if (op == MEMCG_LRU_YOUNG)
+ new = get_memcg_gen(pgdat->memcg_lru.seq + 1);
+ else
+ VM_WARN_ON_ONCE(true);
+
+ hlist_nulls_del_rcu(&lruvec->lrugen.list);
+
+ if (op == MEMCG_LRU_HEAD || op == MEMCG_LRU_OLD)
+ hlist_nulls_add_head_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[new][bin]);
+ else
+ hlist_nulls_add_tail_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[new][bin]);
+
+ pgdat->memcg_lru.nr_memcgs[old]--;
+ pgdat->memcg_lru.nr_memcgs[new]++;
+
+ lruvec->lrugen.gen = new;
+ WRITE_ONCE(lruvec->lrugen.seg, seg);
+
+ if (!pgdat->memcg_lru.nr_memcgs[old] && old == get_memcg_gen(pgdat->memcg_lru.seq))
+ WRITE_ONCE(pgdat->memcg_lru.seq, pgdat->memcg_lru.seq + 1);
+
+ spin_unlock(&pgdat->memcg_lru.lock);
+}
+
+void lru_gen_online_memcg(struct mem_cgroup *memcg)
+{
+ int gen;
+ int nid;
+ int bin = get_random_u32_below(MEMCG_NR_BINS);
+
+ for_each_node(nid) {
+ struct pglist_data *pgdat = NODE_DATA(nid);
+ struct lruvec *lruvec = get_lruvec(memcg, nid);
+
+ spin_lock(&pgdat->memcg_lru.lock);
+
+ VM_WARN_ON_ONCE(!hlist_nulls_unhashed(&lruvec->lrugen.list));
+
+ gen = get_memcg_gen(pgdat->memcg_lru.seq);
+
+ hlist_nulls_add_tail_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[gen][bin]);
+ pgdat->memcg_lru.nr_memcgs[gen]++;
+
+ lruvec->lrugen.gen = gen;
+
+ spin_unlock(&pgdat->memcg_lru.lock);
+ }
+}
+
+void lru_gen_offline_memcg(struct mem_cgroup *memcg)
+{
+ int nid;
+
+ for_each_node(nid) {
+ struct lruvec *lruvec = get_lruvec(memcg, nid);
+
+ lru_gen_rotate_memcg(lruvec, MEMCG_LRU_OLD);
+ }
+}
+
+void lru_gen_release_memcg(struct mem_cgroup *memcg)
+{
+ int gen;
+ int nid;
+
+ for_each_node(nid) {
+ struct pglist_data *pgdat = NODE_DATA(nid);
+ struct lruvec *lruvec = get_lruvec(memcg, nid);
+
+ spin_lock(&pgdat->memcg_lru.lock);
+
+ VM_WARN_ON_ONCE(hlist_nulls_unhashed(&lruvec->lrugen.list));
+
+ gen = lruvec->lrugen.gen;
+
+ hlist_nulls_del_rcu(&lruvec->lrugen.list);
+ pgdat->memcg_lru.nr_memcgs[gen]--;
+
+ if (!pgdat->memcg_lru.nr_memcgs[gen] && gen == get_memcg_gen(pgdat->memcg_lru.seq))
+ WRITE_ONCE(pgdat->memcg_lru.seq, pgdat->memcg_lru.seq + 1);
+
+ spin_unlock(&pgdat->memcg_lru.lock);
+ }
+}
+
+void lru_gen_soft_reclaim(struct lruvec *lruvec)
+{
+ /* see the comment on MEMCG_NR_GENS */
+ if (lru_gen_memcg_seg(lruvec) != MEMCG_LRU_HEAD)
+ lru_gen_rotate_memcg(lruvec, MEMCG_LRU_HEAD);
+}
+
+#else /* !CONFIG_MEMCG */
+
+static int lru_gen_memcg_seg(struct lruvec *lruvec)
+{
+ return 0;
+}
+
+#endif
+
+/******************************************************************************
* the eviction
******************************************************************************/
@@ -5398,53 +5540,6 @@ done:
pgdat->kswapd_failures = 0;
}
-#ifdef CONFIG_MEMCG
-void lru_gen_rotate_memcg(struct lruvec *lruvec, int op)
-{
- int seg;
- int old, new;
- int bin = get_random_u32_below(MEMCG_NR_BINS);
- struct pglist_data *pgdat = lruvec_pgdat(lruvec);
-
- spin_lock(&pgdat->memcg_lru.lock);
-
- VM_WARN_ON_ONCE(hlist_nulls_unhashed(&lruvec->lrugen.list));
-
- seg = 0;
- new = old = lruvec->lrugen.gen;
-
- /* see the comment on MEMCG_NR_GENS */
- if (op == MEMCG_LRU_HEAD)
- seg = MEMCG_LRU_HEAD;
- else if (op == MEMCG_LRU_TAIL)
- seg = MEMCG_LRU_TAIL;
- else if (op == MEMCG_LRU_OLD)
- new = get_memcg_gen(pgdat->memcg_lru.seq);
- else if (op == MEMCG_LRU_YOUNG)
- new = get_memcg_gen(pgdat->memcg_lru.seq + 1);
- else
- VM_WARN_ON_ONCE(true);
-
- hlist_nulls_del_rcu(&lruvec->lrugen.list);
-
- if (op == MEMCG_LRU_HEAD || op == MEMCG_LRU_OLD)
- hlist_nulls_add_head_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[new][bin]);
- else
- hlist_nulls_add_tail_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[new][bin]);
-
- pgdat->memcg_lru.nr_memcgs[old]--;
- pgdat->memcg_lru.nr_memcgs[new]++;
-
- lruvec->lrugen.gen = new;
- WRITE_ONCE(lruvec->lrugen.seg, seg);
-
- if (!pgdat->memcg_lru.nr_memcgs[old] && old == get_memcg_gen(pgdat->memcg_lru.seq))
- WRITE_ONCE(pgdat->memcg_lru.seq, pgdat->memcg_lru.seq + 1);
-
- spin_unlock(&pgdat->memcg_lru.lock);
-}
-#endif
-
/******************************************************************************
* state change
******************************************************************************/
@@ -6090,67 +6185,6 @@ void lru_gen_exit_memcg(struct mem_cgrou
}
}
-void lru_gen_online_memcg(struct mem_cgroup *memcg)
-{
- int gen;
- int nid;
- int bin = get_random_u32_below(MEMCG_NR_BINS);
-
- for_each_node(nid) {
- struct pglist_data *pgdat = NODE_DATA(nid);
- struct lruvec *lruvec = get_lruvec(memcg, nid);
-
- spin_lock(&pgdat->memcg_lru.lock);
-
- VM_WARN_ON_ONCE(!hlist_nulls_unhashed(&lruvec->lrugen.list));
-
- gen = get_memcg_gen(pgdat->memcg_lru.seq);
-
- hlist_nulls_add_tail_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[gen][bin]);
- pgdat->memcg_lru.nr_memcgs[gen]++;
-
- lruvec->lrugen.gen = gen;
-
- spin_unlock(&pgdat->memcg_lru.lock);
- }
-}
-
-void lru_gen_offline_memcg(struct mem_cgroup *memcg)
-{
- int nid;
-
- for_each_node(nid) {
- struct lruvec *lruvec = get_lruvec(memcg, nid);
-
- lru_gen_rotate_memcg(lruvec, MEMCG_LRU_OLD);
- }
-}
-
-void lru_gen_release_memcg(struct mem_cgroup *memcg)
-{
- int gen;
- int nid;
-
- for_each_node(nid) {
- struct pglist_data *pgdat = NODE_DATA(nid);
- struct lruvec *lruvec = get_lruvec(memcg, nid);
-
- spin_lock(&pgdat->memcg_lru.lock);
-
- VM_WARN_ON_ONCE(hlist_nulls_unhashed(&lruvec->lrugen.list));
-
- gen = lruvec->lrugen.gen;
-
- hlist_nulls_del_rcu(&lruvec->lrugen.list);
- pgdat->memcg_lru.nr_memcgs[gen]--;
-
- if (!pgdat->memcg_lru.nr_memcgs[gen] && gen == get_memcg_gen(pgdat->memcg_lru.seq))
- WRITE_ONCE(pgdat->memcg_lru.seq, pgdat->memcg_lru.seq + 1);
-
- spin_unlock(&pgdat->memcg_lru.lock);
- }
-}
-
#endif /* CONFIG_MEMCG */
static int __init init_lru_gen(void)

View File

@ -1,40 +0,0 @@
From bec433f29537652ed054148edfd7e2183ddcf7c3 Mon Sep 17 00:00:00 2001
From: "T.J. Alumbaugh" <talumbau@google.com>
Date: Wed, 18 Jan 2023 00:18:25 +0000
Subject: [PATCH 16/19] UPSTREAM: mm: multi-gen LRU: improve
lru_gen_exit_memcg()
Add warnings and poison ->next.
Link: https://lkml.kernel.org/r/20230118001827.1040870-6-talumbau@google.com
Change-Id: I53de9e04c1ae941e122b33cd45d2bbb5f34aae0c
Signed-off-by: T.J. Alumbaugh <talumbau@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 37cc99979d04cca677c0ad5c0acd1149ec165d1b)
Bug: 274865848
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
mm/vmscan.c | 5 +++++
1 file changed, 5 insertions(+)
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -6172,12 +6172,17 @@ void lru_gen_exit_memcg(struct mem_cgrou
int i;
int nid;
+ VM_WARN_ON_ONCE(!list_empty(&memcg->mm_list.fifo));
+
for_each_node(nid) {
struct lruvec *lruvec = get_lruvec(memcg, nid);
+ VM_WARN_ON_ONCE(lruvec->mm_state.nr_walkers);
VM_WARN_ON_ONCE(memchr_inv(lruvec->lrugen.nr_pages, 0,
sizeof(lruvec->lrugen.nr_pages)));
+ lruvec->lrugen.list.next = LIST_POISON1;
+
for (i = 0; i < NR_BLOOM_FILTERS; i++) {
bitmap_free(lruvec->mm_state.filters[i]);
lruvec->mm_state.filters[i] = NULL;

View File

@ -1,135 +0,0 @@
From fc0e3b06e0f19917b7ecad7967a72f61d4743644 Mon Sep 17 00:00:00 2001
From: "T.J. Alumbaugh" <talumbau@google.com>
Date: Wed, 18 Jan 2023 00:18:26 +0000
Subject: [PATCH 17/19] UPSTREAM: mm: multi-gen LRU: improve walk_pmd_range()
Improve readability of walk_pmd_range() and walk_pmd_range_locked().
Link: https://lkml.kernel.org/r/20230118001827.1040870-7-talumbau@google.com
Change-Id: Ia084fbf53fe989673b7804ca8ca520af12d7d52a
Signed-off-by: T.J. Alumbaugh <talumbau@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit b5ff4133617d0eced35b685da0bd0929dd9fabb7)
Bug: 274865848
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
mm/vmscan.c | 40 ++++++++++++++++++++--------------------
1 file changed, 20 insertions(+), 20 deletions(-)
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3980,8 +3980,8 @@ restart:
}
#if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG)
-static void walk_pmd_range_locked(pud_t *pud, unsigned long next, struct vm_area_struct *vma,
- struct mm_walk *args, unsigned long *bitmap, unsigned long *start)
+static void walk_pmd_range_locked(pud_t *pud, unsigned long addr, struct vm_area_struct *vma,
+ struct mm_walk *args, unsigned long *bitmap, unsigned long *first)
{
int i;
pmd_t *pmd;
@@ -3994,18 +3994,19 @@ static void walk_pmd_range_locked(pud_t
VM_WARN_ON_ONCE(pud_leaf(*pud));
/* try to batch at most 1+MIN_LRU_BATCH+1 entries */
- if (*start == -1) {
- *start = next;
+ if (*first == -1) {
+ *first = addr;
+ bitmap_zero(bitmap, MIN_LRU_BATCH);
return;
}
- i = next == -1 ? 0 : pmd_index(next) - pmd_index(*start);
+ i = addr == -1 ? 0 : pmd_index(addr) - pmd_index(*first);
if (i && i <= MIN_LRU_BATCH) {
__set_bit(i - 1, bitmap);
return;
}
- pmd = pmd_offset(pud, *start);
+ pmd = pmd_offset(pud, *first);
ptl = pmd_lockptr(args->mm, pmd);
if (!spin_trylock(ptl))
@@ -4016,15 +4017,16 @@ static void walk_pmd_range_locked(pud_t
do {
unsigned long pfn;
struct folio *folio;
- unsigned long addr = i ? (*start & PMD_MASK) + i * PMD_SIZE : *start;
+
+ /* don't round down the first address */
+ addr = i ? (*first & PMD_MASK) + i * PMD_SIZE : *first;
pfn = get_pmd_pfn(pmd[i], vma, addr);
if (pfn == -1)
goto next;
if (!pmd_trans_huge(pmd[i])) {
- if (arch_has_hw_nonleaf_pmd_young() &&
- get_cap(LRU_GEN_NONLEAF_YOUNG))
+ if (arch_has_hw_nonleaf_pmd_young() && get_cap(LRU_GEN_NONLEAF_YOUNG))
pmdp_test_and_clear_young(vma, addr, pmd + i);
goto next;
}
@@ -4053,12 +4055,11 @@ next:
arch_leave_lazy_mmu_mode();
spin_unlock(ptl);
done:
- *start = -1;
- bitmap_zero(bitmap, MIN_LRU_BATCH);
+ *first = -1;
}
#else
-static void walk_pmd_range_locked(pud_t *pud, unsigned long next, struct vm_area_struct *vma,
- struct mm_walk *args, unsigned long *bitmap, unsigned long *start)
+static void walk_pmd_range_locked(pud_t *pud, unsigned long addr, struct vm_area_struct *vma,
+ struct mm_walk *args, unsigned long *bitmap, unsigned long *first)
{
}
#endif
@@ -4071,9 +4072,9 @@ static void walk_pmd_range(pud_t *pud, u
unsigned long next;
unsigned long addr;
struct vm_area_struct *vma;
- unsigned long pos = -1;
+ unsigned long bitmap[BITS_TO_LONGS(MIN_LRU_BATCH)];
+ unsigned long first = -1;
struct lru_gen_mm_walk *walk = args->private;
- unsigned long bitmap[BITS_TO_LONGS(MIN_LRU_BATCH)] = {};
VM_WARN_ON_ONCE(pud_leaf(*pud));
@@ -4115,18 +4116,17 @@ restart:
if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat))
continue;
- walk_pmd_range_locked(pud, addr, vma, args, bitmap, &pos);
+ walk_pmd_range_locked(pud, addr, vma, args, bitmap, &first);
continue;
}
#endif
walk->mm_stats[MM_NONLEAF_TOTAL]++;
- if (arch_has_hw_nonleaf_pmd_young() &&
- get_cap(LRU_GEN_NONLEAF_YOUNG)) {
+ if (arch_has_hw_nonleaf_pmd_young() && get_cap(LRU_GEN_NONLEAF_YOUNG)) {
if (!pmd_young(val))
continue;
- walk_pmd_range_locked(pud, addr, vma, args, bitmap, &pos);
+ walk_pmd_range_locked(pud, addr, vma, args, bitmap, &first);
}
if (!walk->force_scan && !test_bloom_filter(walk->lruvec, walk->max_seq, pmd + i))
@@ -4143,7 +4143,7 @@ restart:
update_bloom_filter(walk->lruvec, walk->max_seq + 1, pmd + i);
}
- walk_pmd_range_locked(pud, -1, vma, args, bitmap, &pos);
+ walk_pmd_range_locked(pud, -1, vma, args, bitmap, &first);
if (i < PTRS_PER_PMD && get_next_vma(PUD_MASK, PMD_SIZE, args, &start, &end))
goto restart;

View File

@ -1,148 +0,0 @@
From e604c3ccb4dfbdde2467fccef9bb36170a392695 Mon Sep 17 00:00:00 2001
From: "T.J. Alumbaugh" <talumbau@google.com>
Date: Wed, 18 Jan 2023 00:18:27 +0000
Subject: [PATCH 18/19] UPSTREAM: mm: multi-gen LRU: simplify
lru_gen_look_around()
Update the folio generation in place with or without
current->reclaim_state->mm_walk. The LRU lock is held for longer, if
mm_walk is NULL and the number of folios to update is more than
PAGEVEC_SIZE.
This causes a measurable regression from the LRU lock contention during a
microbencmark. But a tiny regression is not worth the complexity.
Link: https://lkml.kernel.org/r/20230118001827.1040870-8-talumbau@google.com
Change-Id: I9ce18b4f4062e6c1c13c98ece9422478eb8e1846
Signed-off-by: T.J. Alumbaugh <talumbau@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit abf086721a2f1e6897c57796f7268df1b194c750)
Bug: 274865848
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
mm/vmscan.c | 73 +++++++++++++++++------------------------------------
1 file changed, 23 insertions(+), 50 deletions(-)
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4573,13 +4573,12 @@ static void lru_gen_age_node(struct pgli
void lru_gen_look_around(struct page_vma_mapped_walk *pvmw)
{
int i;
- pte_t *pte;
unsigned long start;
unsigned long end;
- unsigned long addr;
struct lru_gen_mm_walk *walk;
int young = 0;
- unsigned long bitmap[BITS_TO_LONGS(MIN_LRU_BATCH)] = {};
+ pte_t *pte = pvmw->pte;
+ unsigned long addr = pvmw->address;
struct folio *folio = pfn_folio(pvmw->pfn);
struct mem_cgroup *memcg = folio_memcg(folio);
struct pglist_data *pgdat = folio_pgdat(folio);
@@ -4596,25 +4595,28 @@ void lru_gen_look_around(struct page_vma
/* avoid taking the LRU lock under the PTL when possible */
walk = current->reclaim_state ? current->reclaim_state->mm_walk : NULL;
- start = max(pvmw->address & PMD_MASK, pvmw->vma->vm_start);
- end = min(pvmw->address | ~PMD_MASK, pvmw->vma->vm_end - 1) + 1;
+ start = max(addr & PMD_MASK, pvmw->vma->vm_start);
+ end = min(addr | ~PMD_MASK, pvmw->vma->vm_end - 1) + 1;
if (end - start > MIN_LRU_BATCH * PAGE_SIZE) {
- if (pvmw->address - start < MIN_LRU_BATCH * PAGE_SIZE / 2)
+ if (addr - start < MIN_LRU_BATCH * PAGE_SIZE / 2)
end = start + MIN_LRU_BATCH * PAGE_SIZE;
- else if (end - pvmw->address < MIN_LRU_BATCH * PAGE_SIZE / 2)
+ else if (end - addr < MIN_LRU_BATCH * PAGE_SIZE / 2)
start = end - MIN_LRU_BATCH * PAGE_SIZE;
else {
- start = pvmw->address - MIN_LRU_BATCH * PAGE_SIZE / 2;
- end = pvmw->address + MIN_LRU_BATCH * PAGE_SIZE / 2;
+ start = addr - MIN_LRU_BATCH * PAGE_SIZE / 2;
+ end = addr + MIN_LRU_BATCH * PAGE_SIZE / 2;
}
}
- pte = pvmw->pte - (pvmw->address - start) / PAGE_SIZE;
+ /* folio_update_gen() requires stable folio_memcg() */
+ if (!mem_cgroup_trylock_pages(memcg))
+ return;
- rcu_read_lock();
arch_enter_lazy_mmu_mode();
+ pte -= (addr - start) / PAGE_SIZE;
+
for (i = 0, addr = start; addr != end; i++, addr += PAGE_SIZE) {
unsigned long pfn;
@@ -4639,56 +4641,27 @@ void lru_gen_look_around(struct page_vma
!folio_test_swapcache(folio)))
folio_mark_dirty(folio);
+ if (walk) {
+ old_gen = folio_update_gen(folio, new_gen);
+ if (old_gen >= 0 && old_gen != new_gen)
+ update_batch_size(walk, folio, old_gen, new_gen);
+
+ continue;
+ }
+
old_gen = folio_lru_gen(folio);
if (old_gen < 0)
folio_set_referenced(folio);
else if (old_gen != new_gen)
- __set_bit(i, bitmap);
+ folio_activate(folio);
}
arch_leave_lazy_mmu_mode();
- rcu_read_unlock();
+ mem_cgroup_unlock_pages();
/* feedback from rmap walkers to page table walkers */
if (suitable_to_scan(i, young))
update_bloom_filter(lruvec, max_seq, pvmw->pmd);
-
- if (!walk && bitmap_weight(bitmap, MIN_LRU_BATCH) < PAGEVEC_SIZE) {
- for_each_set_bit(i, bitmap, MIN_LRU_BATCH) {
- folio = pfn_folio(pte_pfn(pte[i]));
- folio_activate(folio);
- }
- return;
- }
-
- /* folio_update_gen() requires stable folio_memcg() */
- if (!mem_cgroup_trylock_pages(memcg))
- return;
-
- if (!walk) {
- spin_lock_irq(&lruvec->lru_lock);
- new_gen = lru_gen_from_seq(lruvec->lrugen.max_seq);
- }
-
- for_each_set_bit(i, bitmap, MIN_LRU_BATCH) {
- folio = pfn_folio(pte_pfn(pte[i]));
- if (folio_memcg_rcu(folio) != memcg)
- continue;
-
- old_gen = folio_update_gen(folio, new_gen);
- if (old_gen < 0 || old_gen == new_gen)
- continue;
-
- if (walk)
- update_batch_size(walk, folio, old_gen, new_gen);
- else
- lru_gen_update_size(lruvec, folio, old_gen, new_gen);
- }
-
- if (!walk)
- spin_unlock_irq(&lruvec->lru_lock);
-
- mem_cgroup_unlock_pages();
}
/******************************************************************************

View File

@ -1,273 +0,0 @@
From 418038c22452df38cde519cc8c662bb15139764a Mon Sep 17 00:00:00 2001
From: Kalesh Singh <kaleshsingh@google.com>
Date: Thu, 13 Apr 2023 14:43:26 -0700
Subject: [PATCH 19/19] mm: Multi-gen LRU: remove wait_event_killable()
Android 14 and later default to MGLRU [1] and field telemetry showed
occasional long tail latency (>100ms) in the reclaim path.
Tracing revealed priority inversion in the reclaim path. In
try_to_inc_max_seq(), when high priority tasks were blocked on
wait_event_killable(), the preemption of the low priority task to call
wake_up_all() caused those high priority tasks to wait longer than
necessary. In general, this problem is not different from others of its
kind, e.g., one caused by mutex_lock(). However, it is specific to MGLRU
because it introduced the new wait queue lruvec->mm_state.wait.
The purpose of this new wait queue is to avoid the thundering herd
problem. If many direct reclaimers rush into try_to_inc_max_seq(), only
one can succeed, i.e., the one to wake up the rest, and the rest who
failed might cause premature OOM kills if they do not wait. So far there
is no evidence supporting this scenario, based on how often the wait has
been hit. And this begs the question how useful the wait queue is in
practice.
Based on Minchan's recommendation, which is in line with his commit
6d4675e60135 ("mm: don't be stuck to rmap lock on reclaim path") and the
rest of the MGLRU code which also uses trylock when possible, remove the
wait queue.
[1] https://android-review.googlesource.com/q/I7ed7fbfd6ef9ce10053347528125dd98c39e50bf
Link: https://lkml.kernel.org/r/20230413214326.2147568-1-kaleshsingh@google.com
Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks")
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Suggested-by: Minchan Kim <minchan@kernel.org>
Reported-by: Wei Wang <wvw@google.com>
Acked-by: Yu Zhao <yuzhao@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Jan Alexander Steffens (heftig) <heftig@archlinux.org>
Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/mmzone.h | 8 +--
mm/vmscan.c | 112 +++++++++++++++--------------------------
2 files changed, 42 insertions(+), 78 deletions(-)
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -453,18 +453,14 @@ enum {
struct lru_gen_mm_state {
/* set to max_seq after each iteration */
unsigned long seq;
- /* where the current iteration continues (inclusive) */
+ /* where the current iteration continues after */
struct list_head *head;
- /* where the last iteration ended (exclusive) */
+ /* where the last iteration ended before */
struct list_head *tail;
- /* to wait for the last page table walker to finish */
- struct wait_queue_head wait;
/* Bloom filters flip after each iteration */
unsigned long *filters[NR_BLOOM_FILTERS];
/* the mm stats for debugging */
unsigned long stats[NR_HIST_GENS][NR_MM_STATS];
- /* the number of concurrent page table walkers */
- int nr_walkers;
};
struct lru_gen_mm_walk {
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3371,18 +3371,13 @@ void lru_gen_del_mm(struct mm_struct *mm
if (!lruvec)
continue;
- /* where the last iteration ended (exclusive) */
+ /* where the current iteration continues after */
+ if (lruvec->mm_state.head == &mm->lru_gen.list)
+ lruvec->mm_state.head = lruvec->mm_state.head->prev;
+
+ /* where the last iteration ended before */
if (lruvec->mm_state.tail == &mm->lru_gen.list)
lruvec->mm_state.tail = lruvec->mm_state.tail->next;
-
- /* where the current iteration continues (inclusive) */
- if (lruvec->mm_state.head != &mm->lru_gen.list)
- continue;
-
- lruvec->mm_state.head = lruvec->mm_state.head->next;
- /* the deletion ends the current iteration */
- if (lruvec->mm_state.head == &mm_list->fifo)
- WRITE_ONCE(lruvec->mm_state.seq, lruvec->mm_state.seq + 1);
}
list_del_init(&mm->lru_gen.list);
@@ -3478,68 +3473,54 @@ static bool iterate_mm_list(struct lruve
struct mm_struct **iter)
{
bool first = false;
- bool last = true;
+ bool last = false;
struct mm_struct *mm = NULL;
struct mem_cgroup *memcg = lruvec_memcg(lruvec);
struct lru_gen_mm_list *mm_list = get_mm_list(memcg);
struct lru_gen_mm_state *mm_state = &lruvec->mm_state;
/*
- * There are four interesting cases for this page table walker:
- * 1. It tries to start a new iteration of mm_list with a stale max_seq;
- * there is nothing left to do.
- * 2. It's the first of the current generation, and it needs to reset
- * the Bloom filter for the next generation.
- * 3. It reaches the end of mm_list, and it needs to increment
- * mm_state->seq; the iteration is done.
- * 4. It's the last of the current generation, and it needs to reset the
- * mm stats counters for the next generation.
+ * mm_state->seq is incremented after each iteration of mm_list. There
+ * are three interesting cases for this page table walker:
+ * 1. It tries to start a new iteration with a stale max_seq: there is
+ * nothing left to do.
+ * 2. It started the next iteration: it needs to reset the Bloom filter
+ * so that a fresh set of PTE tables can be recorded.
+ * 3. It ended the current iteration: it needs to reset the mm stats
+ * counters and tell its caller to increment max_seq.
*/
spin_lock(&mm_list->lock);
VM_WARN_ON_ONCE(mm_state->seq + 1 < walk->max_seq);
- VM_WARN_ON_ONCE(*iter && mm_state->seq > walk->max_seq);
- VM_WARN_ON_ONCE(*iter && !mm_state->nr_walkers);
- if (walk->max_seq <= mm_state->seq) {
- if (!*iter)
- last = false;
+ if (walk->max_seq <= mm_state->seq)
goto done;
- }
- if (!mm_state->nr_walkers) {
- VM_WARN_ON_ONCE(mm_state->head && mm_state->head != &mm_list->fifo);
+ if (!mm_state->head)
+ mm_state->head = &mm_list->fifo;
- mm_state->head = mm_list->fifo.next;
+ if (mm_state->head == &mm_list->fifo)
first = true;
- }
-
- while (!mm && mm_state->head != &mm_list->fifo) {
- mm = list_entry(mm_state->head, struct mm_struct, lru_gen.list);
+ do {
mm_state->head = mm_state->head->next;
+ if (mm_state->head == &mm_list->fifo) {
+ WRITE_ONCE(mm_state->seq, mm_state->seq + 1);
+ last = true;
+ break;
+ }
/* force scan for those added after the last iteration */
- if (!mm_state->tail || mm_state->tail == &mm->lru_gen.list) {
- mm_state->tail = mm_state->head;
+ if (!mm_state->tail || mm_state->tail == mm_state->head) {
+ mm_state->tail = mm_state->head->next;
walk->force_scan = true;
}
+ mm = list_entry(mm_state->head, struct mm_struct, lru_gen.list);
if (should_skip_mm(mm, walk))
mm = NULL;
- }
-
- if (mm_state->head == &mm_list->fifo)
- WRITE_ONCE(mm_state->seq, mm_state->seq + 1);
+ } while (!mm);
done:
- if (*iter && !mm)
- mm_state->nr_walkers--;
- if (!*iter && mm)
- mm_state->nr_walkers++;
-
- if (mm_state->nr_walkers)
- last = false;
-
if (*iter || last)
reset_mm_stats(lruvec, walk, last);
@@ -3567,9 +3548,9 @@ static bool iterate_mm_list_nowalk(struc
VM_WARN_ON_ONCE(mm_state->seq + 1 < max_seq);
- if (max_seq > mm_state->seq && !mm_state->nr_walkers) {
- VM_WARN_ON_ONCE(mm_state->head && mm_state->head != &mm_list->fifo);
-
+ if (max_seq > mm_state->seq) {
+ mm_state->head = NULL;
+ mm_state->tail = NULL;
WRITE_ONCE(mm_state->seq, mm_state->seq + 1);
reset_mm_stats(lruvec, NULL, true);
success = true;
@@ -4172,10 +4153,6 @@ restart:
walk_pmd_range(&val, addr, next, args);
- /* a racy check to curtail the waiting time */
- if (wq_has_sleeper(&walk->lruvec->mm_state.wait))
- return 1;
-
if (need_resched() || walk->batched >= MAX_LRU_BATCH) {
end = (addr | ~PUD_MASK) + 1;
goto done;
@@ -4208,8 +4185,14 @@ static void walk_mm(struct lruvec *lruve
walk->next_addr = FIRST_USER_ADDRESS;
do {
+ DEFINE_MAX_SEQ(lruvec);
+
err = -EBUSY;
+ /* another thread might have called inc_max_seq() */
+ if (walk->max_seq != max_seq)
+ break;
+
/* folio_update_gen() requires stable folio_memcg() */
if (!mem_cgroup_trylock_pages(memcg))
break;
@@ -4444,25 +4427,12 @@ static bool try_to_inc_max_seq(struct lr
success = iterate_mm_list(lruvec, walk, &mm);
if (mm)
walk_mm(lruvec, mm, walk);
-
- cond_resched();
} while (mm);
done:
- if (!success) {
- if (sc->priority <= DEF_PRIORITY - 2)
- wait_event_killable(lruvec->mm_state.wait,
- max_seq < READ_ONCE(lrugen->max_seq));
- return false;
- }
+ if (success)
+ inc_max_seq(lruvec, can_swap, force_scan);
- VM_WARN_ON_ONCE(max_seq != READ_ONCE(lrugen->max_seq));
-
- inc_max_seq(lruvec, can_swap, force_scan);
- /* either this sees any waiters or they will see updated max_seq */
- if (wq_has_sleeper(&lruvec->mm_state.wait))
- wake_up_all(&lruvec->mm_state.wait);
-
- return true;
+ return success;
}
/******************************************************************************
@@ -6117,7 +6087,6 @@ void lru_gen_init_lruvec(struct lruvec *
INIT_LIST_HEAD(&lrugen->folios[gen][type][zone]);
lruvec->mm_state.seq = MIN_NR_GENS;
- init_waitqueue_head(&lruvec->mm_state.wait);
}
#ifdef CONFIG_MEMCG
@@ -6150,7 +6119,6 @@ void lru_gen_exit_memcg(struct mem_cgrou
for_each_node(nid) {
struct lruvec *lruvec = get_lruvec(memcg, nid);
- VM_WARN_ON_ONCE(lruvec->mm_state.nr_walkers);
VM_WARN_ON_ONCE(memchr_inv(lruvec->lrugen.nr_pages, 0,
sizeof(lruvec->lrugen.nr_pages)));

View File

@ -1,58 +0,0 @@
From e2192de59e457aef8d1f055a452131f0b3e5d097 Mon Sep 17 00:00:00 2001
From: Johannes Berg <johannes.berg@intel.com>
Date: Wed, 18 Jan 2023 14:26:53 +0100
Subject: [PATCH] bitfield: add FIELD_PREP_CONST()
Neither FIELD_PREP() nor *_encode_bits() can be used
in constant contexts (such as initializers), but we
don't want to define shift constants for all masks
just for use in initializers, and having checks that
the values fit is also useful.
Therefore, add FIELD_PREP_CONST() which is a smaller
version of FIELD_PREP() that can only take constant
arguments and has less friendly (but not less strict)
error checks, and expands to a constant value.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://lore.kernel.org/r/20230118142652.53f20593504b.Iaeea0aee77a6493d70e573b4aa55c91c00e01e4b@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
---
include/linux/bitfield.h | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
--- a/include/linux/bitfield.h
+++ b/include/linux/bitfield.h
@@ -115,6 +115,32 @@
((typeof(_mask))(_val) << __bf_shf(_mask)) & (_mask); \
})
+#define __BF_CHECK_POW2(n) BUILD_BUG_ON_ZERO(((n) & ((n) - 1)) != 0)
+
+/**
+ * FIELD_PREP_CONST() - prepare a constant bitfield element
+ * @_mask: shifted mask defining the field's length and position
+ * @_val: value to put in the field
+ *
+ * FIELD_PREP_CONST() masks and shifts up the value. The result should
+ * be combined with other fields of the bitfield using logical OR.
+ *
+ * Unlike FIELD_PREP() this is a constant expression and can therefore
+ * be used in initializers. Error checking is less comfortable for this
+ * version, and non-constant masks cannot be used.
+ */
+#define FIELD_PREP_CONST(_mask, _val) \
+ ( \
+ /* mask must be non-zero */ \
+ BUILD_BUG_ON_ZERO((_mask) == 0) + \
+ /* check if value fits */ \
+ BUILD_BUG_ON_ZERO(~((_mask) >> __bf_shf(_mask)) & (_val)) + \
+ /* check if mask is contiguous */ \
+ __BF_CHECK_POW2((_mask) + (1ULL << __bf_shf(_mask))) + \
+ /* and create the value */ \
+ (((typeof(_mask))(_val) << __bf_shf(_mask)) & (_mask)) \
+ )
+
/**
* FIELD_GET() - extract a bitfield element
* @_mask: shifted mask defining the field's length and position

View File

@ -1,63 +0,0 @@
From 579aee9fc594af94c242068c011b0233563d4bbf Mon Sep 17 00:00:00 2001
From: Stephen Rothwell <sfr@canb.auug.org.au>
Date: Mon, 10 Oct 2022 16:57:21 +1100
Subject: [PATCH] powerpc: suppress some linker warnings in recent linker
versions
This is a follow on from commit
0d362be5b142 ("Makefile: link with -z noexecstack --no-warn-rwx-segments")
for arch/powerpc/boot to address wanrings like:
ld: warning: opal-calls.o: missing .note.GNU-stack section implies executable stack
ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker
ld: warning: arch/powerpc/boot/zImage.epapr has a LOAD segment with RWX permissions
This fixes issue https://github.com/linuxppc/issues/issues/417
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221010165721.106267e6@canb.auug.org.au
---
arch/powerpc/boot/wrapper | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
--- a/arch/powerpc/boot/wrapper
+++ b/arch/powerpc/boot/wrapper
@@ -215,6 +215,11 @@ ld_version()
}'
}
+ld_is_lld()
+{
+ ${CROSS}ld -V 2>&1 | grep -q LLD
+}
+
# Do not include PT_INTERP segment when linking pie. Non-pie linking
# just ignores this option.
LD_VERSION=$(${CROSS}ld --version | ld_version)
@@ -223,6 +228,14 @@ if [ "$LD_VERSION" -ge "$LD_NO_DL_MIN_VE
nodl="--no-dynamic-linker"
fi
+# suppress some warnings in recent ld versions
+nowarn="-z noexecstack"
+if ! ld_is_lld; then
+ if [ "$LD_VERSION" -ge "$(echo 2.39 | ld_version)" ]; then
+ nowarn="$nowarn --no-warn-rwx-segments"
+ fi
+fi
+
platformo=$object/"$platform".o
lds=$object/zImage.lds
ext=strip
@@ -504,7 +517,7 @@ if [ "$platform" != "miboot" ]; then
text_start="-Ttext $link_address"
fi
#link everything
- ${CROSS}ld -m $format -T $lds $text_start $pie $nodl $rodynamic $notext -o "$ofile" $map \
+ ${CROSS}ld -m $format -T $lds $text_start $pie $nodl $nowarn $rodynamic $notext -o "$ofile" $map \
$platformo $tmp $object/wrapper.a
rm $tmp
fi

View File

@ -1,65 +0,0 @@
From 63db0cb35e1cb3b3c134906d1062f65513fdda2d Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Rafa=C5=82=20Mi=C5=82ecki?= <rafal@milecki.pl>
Date: Tue, 4 Oct 2022 10:37:09 +0200
Subject: [PATCH] mtd: core: simplify (a bit) code find partition-matching
dynamic OF node
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
1. Don't hardcode "partition-" string twice
2. Use simpler logic & use ->name to avoid of_property_read_string()
3. Use mtd_get_of_node() helper
Cc: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20221004083710.27704-1-zajec5@gmail.com
---
drivers/mtd/mtdcore.c | 16 +++++++---------
1 file changed, 7 insertions(+), 9 deletions(-)
--- a/drivers/mtd/mtdcore.c
+++ b/drivers/mtd/mtdcore.c
@@ -551,18 +551,16 @@ static void mtd_check_of_node(struct mtd
struct device_node *partitions, *parent_dn, *mtd_dn = NULL;
const char *pname, *prefix = "partition-";
int plen, mtd_name_len, offset, prefix_len;
- struct mtd_info *parent;
bool found = false;
/* Check if MTD already has a device node */
- if (dev_of_node(&mtd->dev))
+ if (mtd_get_of_node(mtd))
return;
/* Check if a partitions node exist */
if (!mtd_is_partition(mtd))
return;
- parent = mtd->parent;
- parent_dn = of_node_get(dev_of_node(&parent->dev));
+ parent_dn = of_node_get(mtd_get_of_node(mtd->parent));
if (!parent_dn)
return;
@@ -575,15 +573,15 @@ static void mtd_check_of_node(struct mtd
/* Search if a partition is defined with the same name */
for_each_child_of_node(partitions, mtd_dn) {
- offset = 0;
-
/* Skip partition with no/wrong prefix */
- if (!of_node_name_prefix(mtd_dn, "partition-"))
+ if (!of_node_name_prefix(mtd_dn, prefix))
continue;
/* Label have priority. Check that first */
- if (of_property_read_string(mtd_dn, "label", &pname)) {
- of_property_read_string(mtd_dn, "name", &pname);
+ if (!of_property_read_string(mtd_dn, "label", &pname)) {
+ offset = 0;
+ } else {
+ pname = mtd_dn->name;
offset = prefix_len;
}

View File

@ -1,84 +0,0 @@
From ddb8cefb7af288950447ca6eeeafb09977dab56f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Rafa=C5=82=20Mi=C5=82ecki?= <rafal@milecki.pl>
Date: Tue, 4 Oct 2022 10:37:10 +0200
Subject: [PATCH] mtd: core: try to find OF node for every MTD partition
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
So far this feature was limited to the top-level "nvmem-cells" node.
There are multiple parsers creating partitions and subpartitions
dynamically. Extend that code to handle them too.
This allows finding partition-* node for every MTD (sub)partition.
Random example:
partitions {
compatible = "brcm,bcm947xx-cfe-partitions";
partition-firmware {
compatible = "brcm,trx";
partition-loader {
};
};
};
Cc: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20221004083710.27704-2-zajec5@gmail.com
---
drivers/mtd/mtdcore.c | 18 ++++++------------
1 file changed, 6 insertions(+), 12 deletions(-)
--- a/drivers/mtd/mtdcore.c
+++ b/drivers/mtd/mtdcore.c
@@ -551,20 +551,22 @@ static void mtd_check_of_node(struct mtd
struct device_node *partitions, *parent_dn, *mtd_dn = NULL;
const char *pname, *prefix = "partition-";
int plen, mtd_name_len, offset, prefix_len;
- bool found = false;
/* Check if MTD already has a device node */
if (mtd_get_of_node(mtd))
return;
- /* Check if a partitions node exist */
if (!mtd_is_partition(mtd))
return;
+
parent_dn = of_node_get(mtd_get_of_node(mtd->parent));
if (!parent_dn)
return;
- partitions = of_get_child_by_name(parent_dn, "partitions");
+ if (mtd_is_partition(mtd->parent))
+ partitions = of_node_get(parent_dn);
+ else
+ partitions = of_get_child_by_name(parent_dn, "partitions");
if (!partitions)
goto exit_parent;
@@ -588,19 +590,11 @@ static void mtd_check_of_node(struct mtd
plen = strlen(pname) - offset;
if (plen == mtd_name_len &&
!strncmp(mtd->name, pname + offset, plen)) {
- found = true;
+ mtd_set_of_node(mtd, mtd_dn);
break;
}
}
- if (!found)
- goto exit_partitions;
-
- /* Set of_node only for nvmem */
- if (of_device_is_compatible(mtd_dn, "nvmem-cells"))
- mtd_set_of_node(mtd, mtd_dn);
-
-exit_partitions:
of_node_put(partitions);
exit_parent:
of_node_put(parent_dn);

View File

@ -1,47 +0,0 @@
From 26422ac78e9d8767bd4aabfbae616b15edbf6a1b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Rafa=C5=82=20Mi=C5=82ecki?= <rafal@milecki.pl>
Date: Sat, 22 Oct 2022 23:13:18 +0200
Subject: [PATCH] mtd: core: set ROOT_DEV for partitions marked as rootfs in DT
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
This adds support for "linux,rootfs" binding that is used to mark flash
partition containing rootfs. It's useful for devices using device tree
that don't have bootloader passing root info in cmdline.
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20221022211318.32009-2-zajec5@gmail.com
---
drivers/mtd/mtdcore.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
--- a/drivers/mtd/mtdcore.c
+++ b/drivers/mtd/mtdcore.c
@@ -28,6 +28,7 @@
#include <linux/leds.h>
#include <linux/debugfs.h>
#include <linux/nvmem-provider.h>
+#include <linux/root_dev.h>
#include <linux/mtd/mtd.h>
#include <linux/mtd/partitions.h>
@@ -737,6 +738,17 @@ int add_mtd_device(struct mtd_info *mtd)
not->add(mtd);
mutex_unlock(&mtd_table_mutex);
+
+ if (of_find_property(mtd_get_of_node(mtd), "linux,rootfs", NULL)) {
+ if (IS_BUILTIN(CONFIG_MTD)) {
+ pr_info("mtd: setting mtd%d (%s) as root device\n", mtd->index, mtd->name);
+ ROOT_DEV = MKDEV(MTD_BLOCK_MAJOR, mtd->index);
+ } else {
+ pr_warn("mtd: can't set mtd%d (%s) as root device - mtd must be builtin\n",
+ mtd->index, mtd->name);
+ }
+ }
+
/* We _know_ we aren't being removed, because
our caller is still holding us here. So none
of this try_ nonsense, and no bitching about it

View File

@ -1,229 +0,0 @@
From aec4d5f5ffd0f0092bd9dc21ea90e0bc237d4b74 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Rafa=C5=82=20Mi=C5=82ecki?= <rafal@milecki.pl>
Date: Sat, 15 Oct 2022 11:29:50 +0200
Subject: [PATCH] mtd: parsers: add TP-Link SafeLoader partitions table parser
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
This parser deals with most TP-Link home routers. It reads info about
partitions and registers them in the MTD subsystem.
Example from TP-Link Archer C5 V2:
spi-nor spi0.0: s25fl128s1 (16384 Kbytes)
15 tplink-safeloader partitions found on MTD device spi0.0
Creating 15 MTD partitions on "spi0.0":
0x000000000000-0x000000040000 : "fs-uboot"
0x000000040000-0x000000440000 : "os-image"
0x000000440000-0x000000e40000 : "rootfs"
0x000000e40000-0x000000e40200 : "default-mac"
0x000000e40200-0x000000e40400 : "pin"
0x000000e40400-0x000000e40600 : "product-info"
0x000000e50000-0x000000e60000 : "partition-table"
0x000000e60000-0x000000e60200 : "soft-version"
0x000000e61000-0x000000e70000 : "support-list"
0x000000e70000-0x000000e80000 : "profile"
0x000000e80000-0x000000e90000 : "default-config"
0x000000e90000-0x000000ee0000 : "user-config"
0x000000ee0000-0x000000fe0000 : "log"
0x000000fe0000-0x000000ff0000 : "radio_bk"
0x000000ff0000-0x000001000000 : "radio"
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20221015092950.27467-2-zajec5@gmail.com
---
drivers/mtd/parsers/Kconfig | 15 +++
drivers/mtd/parsers/Makefile | 1 +
drivers/mtd/parsers/tplink_safeloader.c | 150 ++++++++++++++++++++++++
3 files changed, 166 insertions(+)
create mode 100644 drivers/mtd/parsers/tplink_safeloader.c
--- a/drivers/mtd/parsers/Kconfig
+++ b/drivers/mtd/parsers/Kconfig
@@ -123,6 +123,21 @@ config MTD_AFS_PARTS
for your particular device. It won't happen automatically. The
'physmap' map driver (CONFIG_MTD_PHYSMAP) does this, for example.
+config MTD_PARSER_TPLINK_SAFELOADER
+ tristate "TP-Link Safeloader partitions parser"
+ depends on MTD && (ARCH_BCM_5301X || ATH79 || SOC_MT7620 || SOC_MT7621 || COMPILE_TEST)
+ help
+ TP-Link home routers use flash partitions to store various data. Info
+ about flash space layout is stored in a partitions table using a
+ custom ASCII-based format.
+
+ That format was first found in devices with SafeLoader bootloader and
+ was named after it. Later it was adapted to CFE and U-Boot
+ bootloaders.
+
+ This driver reads partitions table, parses it and creates MTD
+ partitions.
+
config MTD_PARSER_TRX
tristate "Parser for TRX format partitions"
depends on MTD && (BCM47XX || ARCH_BCM_5301X || ARCH_MEDIATEK || RALINK || COMPILE_TEST)
--- a/drivers/mtd/parsers/Makefile
+++ b/drivers/mtd/parsers/Makefile
@@ -10,6 +10,7 @@ ofpart-$(CONFIG_MTD_OF_PARTS_BCM4908) +=
ofpart-$(CONFIG_MTD_OF_PARTS_LINKSYS_NS)+= ofpart_linksys_ns.o
obj-$(CONFIG_MTD_PARSER_IMAGETAG) += parser_imagetag.o
obj-$(CONFIG_MTD_AFS_PARTS) += afs.o
+obj-$(CONFIG_MTD_PARSER_TPLINK_SAFELOADER) += tplink_safeloader.o
obj-$(CONFIG_MTD_PARSER_TRX) += parser_trx.o
obj-$(CONFIG_MTD_SERCOMM_PARTS) += scpart.o
obj-$(CONFIG_MTD_SHARPSL_PARTS) += sharpslpart.o
--- /dev/null
+++ b/drivers/mtd/parsers/tplink_safeloader.c
@@ -0,0 +1,150 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright © 2022 Rafał Miłecki <rafal@milecki.pl>
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/mtd/mtd.h>
+#include <linux/mtd/partitions.h>
+#include <linux/of.h>
+#include <linux/slab.h>
+
+#define TPLINK_SAFELOADER_DATA_OFFSET 4
+#define TPLINK_SAFELOADER_MAX_PARTS 32
+
+struct safeloader_cmn_header {
+ __be32 size;
+ uint32_t unused;
+} __packed;
+
+static void *mtd_parser_tplink_safeloader_read_table(struct mtd_info *mtd)
+{
+ struct safeloader_cmn_header hdr;
+ struct device_node *np;
+ size_t bytes_read;
+ size_t offset;
+ size_t size;
+ char *buf;
+ int err;
+
+ np = mtd_get_of_node(mtd);
+ if (mtd_is_partition(mtd))
+ of_node_get(np);
+ else
+ np = of_get_child_by_name(np, "partitions");
+
+ if (of_property_read_u32(np, "partitions-table-offset", (u32 *)&offset)) {
+ pr_err("Failed to get partitions table offset\n");
+ goto err_put;
+ }
+
+ err = mtd_read(mtd, offset, sizeof(hdr), &bytes_read, (uint8_t *)&hdr);
+ if (err && !mtd_is_bitflip(err)) {
+ pr_err("Failed to read from %s at 0x%zx\n", mtd->name, offset);
+ goto err_put;
+ }
+
+ size = be32_to_cpu(hdr.size);
+
+ buf = kmalloc(size + 1, GFP_KERNEL);
+ if (!buf)
+ goto err_put;
+
+ err = mtd_read(mtd, offset + sizeof(hdr), size, &bytes_read, buf);
+ if (err && !mtd_is_bitflip(err)) {
+ pr_err("Failed to read from %s at 0x%zx\n", mtd->name, offset + sizeof(hdr));
+ goto err_kfree;
+ }
+
+ buf[size] = '\0';
+
+ of_node_put(np);
+
+ return buf;
+
+err_kfree:
+ kfree(buf);
+err_put:
+ of_node_put(np);
+ return NULL;
+}
+
+static int mtd_parser_tplink_safeloader_parse(struct mtd_info *mtd,
+ const struct mtd_partition **pparts,
+ struct mtd_part_parser_data *data)
+{
+ struct mtd_partition *parts;
+ char name[65];
+ size_t offset;
+ size_t bytes;
+ char *buf;
+ int idx;
+ int err;
+
+ parts = kcalloc(TPLINK_SAFELOADER_MAX_PARTS, sizeof(*parts), GFP_KERNEL);
+ if (!parts) {
+ err = -ENOMEM;
+ goto err_out;
+ }
+
+ buf = mtd_parser_tplink_safeloader_read_table(mtd);
+ if (!buf) {
+ err = -ENOENT;
+ goto err_out;
+ }
+
+ for (idx = 0, offset = TPLINK_SAFELOADER_DATA_OFFSET;
+ idx < TPLINK_SAFELOADER_MAX_PARTS &&
+ sscanf(buf + offset, "partition %64s base 0x%llx size 0x%llx%zn\n",
+ name, &parts[idx].offset, &parts[idx].size, &bytes) == 3;
+ idx++, offset += bytes + 1) {
+ parts[idx].name = kstrdup(name, GFP_KERNEL);
+ if (!parts[idx].name) {
+ err = -ENOMEM;
+ goto err_free;
+ }
+ }
+
+ if (idx == TPLINK_SAFELOADER_MAX_PARTS)
+ pr_warn("Reached maximum number of partitions!\n");
+
+ kfree(buf);
+
+ *pparts = parts;
+
+ return idx;
+
+err_free:
+ for (idx -= 1; idx >= 0; idx--)
+ kfree(parts[idx].name);
+err_out:
+ return err;
+};
+
+static void mtd_parser_tplink_safeloader_cleanup(const struct mtd_partition *pparts,
+ int nr_parts)
+{
+ int i;
+
+ for (i = 0; i < nr_parts; i++)
+ kfree(pparts[i].name);
+
+ kfree(pparts);
+}
+
+static const struct of_device_id mtd_parser_tplink_safeloader_of_match_table[] = {
+ { .compatible = "tplink,safeloader-partitions" },
+ {},
+};
+MODULE_DEVICE_TABLE(of, mtd_parser_tplink_safeloader_of_match_table);
+
+static struct mtd_part_parser mtd_parser_tplink_safeloader = {
+ .parse_fn = mtd_parser_tplink_safeloader_parse,
+ .cleanup = mtd_parser_tplink_safeloader_cleanup,
+ .name = "tplink-safeloader",
+ .of_match_table = mtd_parser_tplink_safeloader_of_match_table,
+};
+module_mtd_part_parser(mtd_parser_tplink_safeloader);
+
+MODULE_LICENSE("GPL");

View File

@ -1,35 +0,0 @@
From ebed787a0becb9354f0a23620a5130cccd6c730c Mon Sep 17 00:00:00 2001
From: Daniel Golle <daniel@makrotopia.org>
Date: Thu, 19 Jan 2023 03:45:43 +0000
Subject: [PATCH] mtd: spinand: macronix: use scratch buffer for DMA operation
The mx35lf1ge4ab_get_eccsr() function uses an SPI DMA operation to
read the eccsr, hence the buffer should not be on stack. Since commit
380583227c0c7f ("spi: spi-mem: Add extra sanity checks on the op param")
the kernel emmits a warning and blocks such operations.
Use the scratch buffer to get eccsr instead of trying to directly read
into a stack-allocated variable.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Dhruva Gole <d-gole@ti.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/Y8i85zM0u4XdM46z@makrotopia.org
---
drivers/mtd/nand/spi/macronix.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/mtd/nand/spi/macronix.c
+++ b/drivers/mtd/nand/spi/macronix.c
@@ -83,9 +83,10 @@ static int mx35lf1ge4ab_ecc_get_status(s
* in order to avoid forcing the wear-leveling layer to move
* data around if it's not necessary.
*/
- if (mx35lf1ge4ab_get_eccsr(spinand, &eccsr))
+ if (mx35lf1ge4ab_get_eccsr(spinand, spinand->scratchbuf))
return nanddev_get_ecc_conf(nand)->strength;
+ eccsr = *spinand->scratchbuf;
if (WARN_ON(eccsr > nanddev_get_ecc_conf(nand)->strength ||
!eccsr))
return nanddev_get_ecc_conf(nand)->strength;

View File

@ -1,33 +0,0 @@
From 1ecf9e390452e73a362ea7fbde8f3f0db83de856 Mon Sep 17 00:00:00 2001
From: Daniel Golle <daniel@makrotopia.org>
Date: Thu, 22 Dec 2022 19:33:04 +0000
Subject: [PATCH] mtd: ubi: wire-up parent MTD device
Wire up the device parent pointer of UBI devices to their lower MTD
device, typically an MTD partition or whole-chip device.
The most noticeable change is that in sysfs, previously ubi devices
would be could in /sys/devices/virtual/ubi while after this change they
would be correctly attached to their parent MTD device, e.g.
/sys/devices/platform/1100d000.spi/spi_master/spi1/spi1.0/mtd/mtd2/ubi0.
Locating UBI devices using /sys/class/ubi/ of course still works as
well.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: Richard Weinberger <richard@nod.at>
---
drivers/mtd/ubi/build.c | 1 +
1 file changed, 1 insertion(+)
--- a/drivers/mtd/ubi/build.c
+++ b/drivers/mtd/ubi/build.c
@@ -929,6 +929,7 @@ int ubi_attach_mtd_dev(struct mtd_info *
ubi->dev.release = dev_release;
ubi->dev.class = &ubi_class;
ubi->dev.groups = ubi_dev_groups;
+ ubi->dev.parent = &mtd->dev;
ubi->mtd = mtd;
ubi->ubi_num = ubi_num;

View File

@ -1,49 +0,0 @@
From 05b8773ca33253ea562be145cf3145b05ef19f86 Mon Sep 17 00:00:00 2001
From: Daniel Golle <daniel@makrotopia.org>
Date: Thu, 22 Dec 2022 19:33:31 +0000
Subject: [PATCH] mtd: ubi: block: wire-up device parent
ubiblock devices were previously only identifyable by their name, but
not connected to their parent UBI volume device e.g. in sysfs.
Properly parent ubiblock device as descendant of a UBI volume device
to reflect device model hierachy.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: Richard Weinberger <richard@nod.at>
---
drivers/mtd/ubi/block.c | 2 +-
drivers/mtd/ubi/kapi.c | 1 +
include/linux/mtd/ubi.h | 1 +
3 files changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/mtd/ubi/block.c
+++ b/drivers/mtd/ubi/block.c
@@ -452,7 +452,7 @@ int ubiblock_create(struct ubi_volume_in
list_add_tail(&dev->list, &ubiblock_devices);
/* Must be the last step: anyone can call file ops from now on */
- ret = add_disk(dev->gd);
+ ret = device_add_disk(vi->dev, dev->gd, NULL);
if (ret)
goto out_destroy_wq;
--- a/drivers/mtd/ubi/kapi.c
+++ b/drivers/mtd/ubi/kapi.c
@@ -79,6 +79,7 @@ void ubi_do_get_volume_info(struct ubi_d
vi->name_len = vol->name_len;
vi->name = vol->name;
vi->cdev = vol->cdev.dev;
+ vi->dev = &vol->dev;
}
/**
--- a/include/linux/mtd/ubi.h
+++ b/include/linux/mtd/ubi.h
@@ -110,6 +110,7 @@ struct ubi_volume_info {
int name_len;
const char *name;
dev_t cdev;
+ struct device *dev;
};
/**

View File

@ -1,47 +0,0 @@
From 281f7a6c1a33fffcde32001bacbb4f672140fbf9 Mon Sep 17 00:00:00 2001
From: Michael Walle <michael@walle.cc>
Date: Wed, 8 Mar 2023 09:20:21 +0100
Subject: [PATCH] mtd: core: prepare mtd_otp_nvmem_add() to handle
-EPROBE_DEFER
NVMEM soon will get the ability for nvmem layouts and these might
not be ready when nvmem_register() is called and thus it might
return -EPROBE_DEFER. Don't print the error message in this case.
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20230308082021.870459-4-michael@walle.cc
---
drivers/mtd/mtdcore.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
--- a/drivers/mtd/mtdcore.c
+++ b/drivers/mtd/mtdcore.c
@@ -953,8 +953,8 @@ static int mtd_otp_nvmem_add(struct mtd_
nvmem = mtd_otp_nvmem_register(mtd, "user-otp", size,
mtd_nvmem_user_otp_reg_read);
if (IS_ERR(nvmem)) {
- dev_err(dev, "Failed to register OTP NVMEM device\n");
- return PTR_ERR(nvmem);
+ err = PTR_ERR(nvmem);
+ goto err;
}
mtd->otp_user_nvmem = nvmem;
}
@@ -971,7 +971,6 @@ static int mtd_otp_nvmem_add(struct mtd_
nvmem = mtd_otp_nvmem_register(mtd, "factory-otp", size,
mtd_nvmem_fact_otp_reg_read);
if (IS_ERR(nvmem)) {
- dev_err(dev, "Failed to register OTP NVMEM device\n");
err = PTR_ERR(nvmem);
goto err;
}
@@ -983,7 +982,7 @@ static int mtd_otp_nvmem_add(struct mtd_
err:
nvmem_unregister(mtd->otp_user_nvmem);
- return err;
+ return dev_err_probe(dev, err, "Failed to register OTP NVMEM device\n");
}
/**

View File

@ -1,41 +0,0 @@
From 7390609b0121a1b982c5ecdfcd72dc328e5784ee Mon Sep 17 00:00:00 2001
From: Michael Walle <michael@walle.cc>
Date: Mon, 6 Feb 2023 13:43:42 +0000
Subject: [PATCH] net: add helper eth_addr_add()
Add a helper to add an offset to a ethernet address. This comes in handy
if you have a base ethernet address for multiple interfaces.
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20230206134356.839737-9-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
include/linux/etherdevice.h | 14 ++++++++++++++
1 file changed, 14 insertions(+)
--- a/include/linux/etherdevice.h
+++ b/include/linux/etherdevice.h
@@ -508,6 +508,20 @@ static inline void eth_addr_inc(u8 *addr
}
/**
+ * eth_addr_add() - Add (or subtract) an offset to/from the given MAC address.
+ *
+ * @offset: Offset to add.
+ * @addr: Pointer to a six-byte array containing Ethernet address to increment.
+ */
+static inline void eth_addr_add(u8 *addr, long offset)
+{
+ u64 u = ether_addr_to_u64(addr);
+
+ u += offset;
+ u64_to_ether_addr(u, addr);
+}
+
+/**
* is_etherdev_addr - Tell if given Ethernet address belongs to the device.
* @dev: Pointer to a device structure
* @addr: Pointer to a six-byte array containing the Ethernet address

View File

@ -1,32 +0,0 @@
From d387e34fec407f881fdf165b5d7ec128ebff362f Mon Sep 17 00:00:00 2001
From: Christian Marangi <ansuelsmth@gmail.com>
Date: Tue, 19 Sep 2023 14:47:20 +0200
Subject: [PATCH] net: sfp: add quirk for Fiberstone GPON-ONU-34-20BI
Fiberstone GPON-ONU-34-20B can operate at 2500base-X, but report 1.2GBd
NRZ in their EEPROM.
The module also require the ignore tx fault fixup similar to Huawei MA5671A
as it gets disabled on error messages with serial redirection enabled.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Link: https://lore.kernel.org/r/20230919124720.8210-1-ansuelsmth@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
drivers/net/phy/sfp.c | 5 +++++
1 file changed, 5 insertions(+)
--- a/drivers/net/phy/sfp.c
+++ b/drivers/net/phy/sfp.c
@@ -393,6 +393,11 @@ static const struct sfp_quirk sfp_quirks
SFP_QUIRK("ALCATELLUCENT", "3FE46541AA", sfp_quirk_2500basex,
sfp_fixup_long_startup),
+ // Fiberstore GPON-ONU-34-20BI can operate at 2500base-X, but report 1.2GBd
+ // NRZ in their EEPROM
+ SFP_QUIRK("FS", "GPON-ONU-34-20BI", sfp_quirk_2500basex,
+ sfp_fixup_ignore_tx_fault),
+
SFP_QUIRK_F("HALNy", "HL-GSFP", sfp_fixup_halny_gsfp),
// HG MXPD-483II-F 2.5G supports 2500Base-X, but incorrectly reports

View File

@ -1,326 +0,0 @@
From 4e4aafcddbbfcdd6eed5780e190fcbfac8b4685a Mon Sep 17 00:00:00 2001
From: Andrew Lunn <andrew@lunn.ch>
Date: Mon, 9 Jan 2023 16:30:41 +0100
Subject: [PATCH] net: mdio: Add dedicated C45 API to MDIO bus drivers
Currently C22 and C45 transactions are mixed over a combined API calls
which make use of a special bit in the reg address to indicate if a
C45 transaction should be performed. This makes it impossible to know
if the bus driver actually supports C45. Additionally, many C22 only
drivers don't return -EOPNOTSUPP when asked to perform a C45
transaction, they mistaking perform a C22 transaction.
This is the first step to cleanly separate C22 from C45. To maintain
backwards compatibility until all drivers which are capable of
performing C45 are converted to this new API, the helper functions
will fall back to the older API if the new API is not
supported. Eventually this fallback will be removed.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
drivers/net/phy/mdio_bus.c | 189 +++++++++++++++++++++++++++++++++++++
include/linux/mdio.h | 39 ++++----
include/linux/phy.h | 5 +
3 files changed, 214 insertions(+), 19 deletions(-)
--- a/drivers/net/phy/mdio_bus.c
+++ b/drivers/net/phy/mdio_bus.c
@@ -832,6 +832,100 @@ int __mdiobus_modify_changed(struct mii_
EXPORT_SYMBOL_GPL(__mdiobus_modify_changed);
/**
+ * __mdiobus_c45_read - Unlocked version of the mdiobus_c45_read function
+ * @bus: the mii_bus struct
+ * @addr: the phy address
+ * @devad: device address to read
+ * @regnum: register number to read
+ *
+ * Read a MDIO bus register. Caller must hold the mdio bus lock.
+ *
+ * NOTE: MUST NOT be called from interrupt context.
+ */
+int __mdiobus_c45_read(struct mii_bus *bus, int addr, int devad, u32 regnum)
+{
+ int retval;
+
+ lockdep_assert_held_once(&bus->mdio_lock);
+
+ if (bus->read_c45)
+ retval = bus->read_c45(bus, addr, devad, regnum);
+ else
+ retval = bus->read(bus, addr, mdiobus_c45_addr(devad, regnum));
+
+ trace_mdio_access(bus, 1, addr, regnum, retval, retval);
+ mdiobus_stats_acct(&bus->stats[addr], true, retval);
+
+ return retval;
+}
+EXPORT_SYMBOL(__mdiobus_c45_read);
+
+/**
+ * __mdiobus_c45_write - Unlocked version of the mdiobus_write function
+ * @bus: the mii_bus struct
+ * @addr: the phy address
+ * @devad: device address to read
+ * @regnum: register number to write
+ * @val: value to write to @regnum
+ *
+ * Write a MDIO bus register. Caller must hold the mdio bus lock.
+ *
+ * NOTE: MUST NOT be called from interrupt context.
+ */
+int __mdiobus_c45_write(struct mii_bus *bus, int addr, int devad, u32 regnum,
+ u16 val)
+{
+ int err;
+
+ lockdep_assert_held_once(&bus->mdio_lock);
+
+ if (bus->write_c45)
+ err = bus->write_c45(bus, addr, devad, regnum, val);
+ else
+ err = bus->write(bus, addr, mdiobus_c45_addr(devad, regnum),
+ val);
+
+ trace_mdio_access(bus, 0, addr, regnum, val, err);
+ mdiobus_stats_acct(&bus->stats[addr], false, err);
+
+ return err;
+}
+EXPORT_SYMBOL(__mdiobus_c45_write);
+
+/**
+ * __mdiobus_c45_modify_changed - Unlocked version of the mdiobus_modify function
+ * @bus: the mii_bus struct
+ * @addr: the phy address
+ * @devad: device address to read
+ * @regnum: register number to modify
+ * @mask: bit mask of bits to clear
+ * @set: bit mask of bits to set
+ *
+ * Read, modify, and if any change, write the register value back to the
+ * device. Any error returns a negative number.
+ *
+ * NOTE: MUST NOT be called from interrupt context.
+ */
+static int __mdiobus_c45_modify_changed(struct mii_bus *bus, int addr,
+ int devad, u32 regnum, u16 mask,
+ u16 set)
+{
+ int new, ret;
+
+ ret = __mdiobus_c45_read(bus, addr, devad, regnum);
+ if (ret < 0)
+ return ret;
+
+ new = (ret & ~mask) | set;
+ if (new == ret)
+ return 0;
+
+ ret = __mdiobus_c45_write(bus, addr, devad, regnum, new);
+
+ return ret < 0 ? ret : 1;
+}
+
+/**
* mdiobus_read_nested - Nested version of the mdiobus_read function
* @bus: the mii_bus struct
* @addr: the phy address
@@ -879,6 +973,29 @@ int mdiobus_read(struct mii_bus *bus, in
EXPORT_SYMBOL(mdiobus_read);
/**
+ * mdiobus_c45_read - Convenience function for reading a given MII mgmt register
+ * @bus: the mii_bus struct
+ * @addr: the phy address
+ * @devad: device address to read
+ * @regnum: register number to read
+ *
+ * NOTE: MUST NOT be called from interrupt context,
+ * because the bus read/write functions may wait for an interrupt
+ * to conclude the operation.
+ */
+int mdiobus_c45_read(struct mii_bus *bus, int addr, int devad, u32 regnum)
+{
+ int retval;
+
+ mutex_lock(&bus->mdio_lock);
+ retval = __mdiobus_c45_read(bus, addr, devad, regnum);
+ mutex_unlock(&bus->mdio_lock);
+
+ return retval;
+}
+EXPORT_SYMBOL(mdiobus_c45_read);
+
+/**
* mdiobus_write_nested - Nested version of the mdiobus_write function
* @bus: the mii_bus struct
* @addr: the phy address
@@ -928,6 +1045,31 @@ int mdiobus_write(struct mii_bus *bus, i
EXPORT_SYMBOL(mdiobus_write);
/**
+ * mdiobus_c45_write - Convenience function for writing a given MII mgmt register
+ * @bus: the mii_bus struct
+ * @addr: the phy address
+ * @devad: device address to read
+ * @regnum: register number to write
+ * @val: value to write to @regnum
+ *
+ * NOTE: MUST NOT be called from interrupt context,
+ * because the bus read/write functions may wait for an interrupt
+ * to conclude the operation.
+ */
+int mdiobus_c45_write(struct mii_bus *bus, int addr, int devad, u32 regnum,
+ u16 val)
+{
+ int err;
+
+ mutex_lock(&bus->mdio_lock);
+ err = __mdiobus_c45_write(bus, addr, devad, regnum, val);
+ mutex_unlock(&bus->mdio_lock);
+
+ return err;
+}
+EXPORT_SYMBOL(mdiobus_c45_write);
+
+/**
* mdiobus_modify - Convenience function for modifying a given mdio device
* register
* @bus: the mii_bus struct
@@ -949,6 +1091,30 @@ int mdiobus_modify(struct mii_bus *bus,
EXPORT_SYMBOL_GPL(mdiobus_modify);
/**
+ * mdiobus_c45_modify - Convenience function for modifying a given mdio device
+ * register
+ * @bus: the mii_bus struct
+ * @addr: the phy address
+ * @devad: device address to read
+ * @regnum: register number to write
+ * @mask: bit mask of bits to clear
+ * @set: bit mask of bits to set
+ */
+int mdiobus_c45_modify(struct mii_bus *bus, int addr, int devad, u32 regnum,
+ u16 mask, u16 set)
+{
+ int err;
+
+ mutex_lock(&bus->mdio_lock);
+ err = __mdiobus_c45_modify_changed(bus, addr, devad, regnum,
+ mask, set);
+ mutex_unlock(&bus->mdio_lock);
+
+ return err < 0 ? err : 0;
+}
+EXPORT_SYMBOL_GPL(mdiobus_c45_modify);
+
+/**
* mdiobus_modify_changed - Convenience function for modifying a given mdio
* device register and returning if it changed
* @bus: the mii_bus struct
@@ -971,6 +1137,29 @@ int mdiobus_modify_changed(struct mii_bu
EXPORT_SYMBOL_GPL(mdiobus_modify_changed);
/**
+ * mdiobus_c45_modify_changed - Convenience function for modifying a given mdio
+ * device register and returning if it changed
+ * @bus: the mii_bus struct
+ * @addr: the phy address
+ * @devad: device address to read
+ * @regnum: register number to write
+ * @mask: bit mask of bits to clear
+ * @set: bit mask of bits to set
+ */
+int mdiobus_c45_modify_changed(struct mii_bus *bus, int devad, int addr,
+ u32 regnum, u16 mask, u16 set)
+{
+ int err;
+
+ mutex_lock(&bus->mdio_lock);
+ err = __mdiobus_c45_modify_changed(bus, addr, devad, regnum, mask, set);
+ mutex_unlock(&bus->mdio_lock);
+
+ return err;
+}
+EXPORT_SYMBOL_GPL(mdiobus_c45_modify_changed);
+
+/**
* mdio_bus_match - determine if given MDIO driver supports the given
* MDIO device
* @dev: target MDIO device
--- a/include/linux/mdio.h
+++ b/include/linux/mdio.h
@@ -423,6 +423,17 @@ int mdiobus_modify(struct mii_bus *bus,
u16 set);
int mdiobus_modify_changed(struct mii_bus *bus, int addr, u32 regnum,
u16 mask, u16 set);
+int __mdiobus_c45_read(struct mii_bus *bus, int addr, int devad, u32 regnum);
+int mdiobus_c45_read(struct mii_bus *bus, int addr, int devad, u32 regnum);
+int __mdiobus_c45_write(struct mii_bus *bus, int addr, int devad, u32 regnum,
+ u16 val);
+int mdiobus_c45_write(struct mii_bus *bus, int addr, int devad, u32 regnum,
+ u16 val);
+int mdiobus_c45_modify(struct mii_bus *bus, int addr, int devad, u32 regnum,
+ u16 mask, u16 set);
+
+int mdiobus_c45_modify_changed(struct mii_bus *bus, int addr, int devad,
+ u32 regnum, u16 mask, u16 set);
static inline int mdiodev_read(struct mdio_device *mdiodev, u32 regnum)
{
@@ -463,29 +474,19 @@ static inline u16 mdiobus_c45_devad(u32
return FIELD_GET(MII_DEVADDR_C45_MASK, regnum);
}
-static inline int __mdiobus_c45_read(struct mii_bus *bus, int prtad, int devad,
- u16 regnum)
-{
- return __mdiobus_read(bus, prtad, mdiobus_c45_addr(devad, regnum));
-}
-
-static inline int __mdiobus_c45_write(struct mii_bus *bus, int prtad, int devad,
- u16 regnum, u16 val)
-{
- return __mdiobus_write(bus, prtad, mdiobus_c45_addr(devad, regnum),
- val);
-}
-
-static inline int mdiobus_c45_read(struct mii_bus *bus, int prtad, int devad,
- u16 regnum)
+static inline int mdiodev_c45_modify(struct mdio_device *mdiodev, int devad,
+ u32 regnum, u16 mask, u16 set)
{
- return mdiobus_read(bus, prtad, mdiobus_c45_addr(devad, regnum));
+ return mdiobus_c45_modify(mdiodev->bus, mdiodev->addr, devad, regnum,
+ mask, set);
}
-static inline int mdiobus_c45_write(struct mii_bus *bus, int prtad, int devad,
- u16 regnum, u16 val)
+static inline int mdiodev_c45_modify_changed(struct mdio_device *mdiodev,
+ int devad, u32 regnum, u16 mask,
+ u16 set)
{
- return mdiobus_write(bus, prtad, mdiobus_c45_addr(devad, regnum), val);
+ return mdiobus_c45_modify_changed(mdiodev->bus, mdiodev->addr, devad,
+ regnum, mask, set);
}
int mdiobus_register_device(struct mdio_device *mdiodev);
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -364,6 +364,11 @@ struct mii_bus {
int (*read)(struct mii_bus *bus, int addr, int regnum);
/** @write: Perform a write transfer on the bus */
int (*write)(struct mii_bus *bus, int addr, int regnum, u16 val);
+ /** @read_c45: Perform a C45 read transfer on the bus */
+ int (*read_c45)(struct mii_bus *bus, int addr, int devnum, int regnum);
+ /** @write_c45: Perform a C45 write transfer on the bus */
+ int (*write_c45)(struct mii_bus *bus, int addr, int devnum,
+ int regnum, u16 val);
/** @reset: Perform a reset of the bus */
int (*reset)(struct mii_bus *bus);

View File

@ -1,105 +0,0 @@
From 9a0e95e34e9c0a713ddfd48c3a88a20d2bdfd514 Mon Sep 17 00:00:00 2001
From: Gabor Juhos <j4g8y7@gmail.com>
Date: Fri, 11 Aug 2023 13:10:07 +0200
Subject: [PATCH] net: phy: Introduce PSGMII PHY interface mode
The PSGMII interface is similar to QSGMII. The main difference
is that the PSGMII interface combines five SGMII lines into a
single link while in QSGMII only four lines are combined.
Similarly to the QSGMII, this interface mode might also needs
special handling within the MAC driver.
It is commonly used by Qualcomm with their QCA807x PHY series and
modern WiSoC-s.
Add definitions for the PHY layer to allow to express this type
of connection between the MAC and PHY.
Signed-off-by: Gabor Juhos <j4g8y7@gmail.com>
Signed-off-by: Robert Marko <robert.marko@sartura.hr>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
Documentation/networking/phy.rst | 4 ++++
drivers/net/phy/phy-core.c | 2 ++
drivers/net/phy/phylink.c | 3 +++
include/linux/phy.h | 4 ++++
4 files changed, 13 insertions(+)
--- a/Documentation/networking/phy.rst
+++ b/Documentation/networking/phy.rst
@@ -323,6 +323,10 @@ Some of the interface modes are describe
contrast with the 1000BASE-X phy mode used for Clause 38 and 39 PMDs, this
interface mode has different autonegotiation and only supports full duplex.
+``PHY_INTERFACE_MODE_PSGMII``
+ This is the Penta SGMII mode, it is similar to QSGMII but it combines 5
+ SGMII lines into a single link compared to 4 on QSGMII.
+
Pause frames / flow control
===========================
--- a/drivers/net/phy/phy-core.c
+++ b/drivers/net/phy/phy-core.c
@@ -140,6 +140,8 @@ int phy_interface_num_ports(phy_interfac
case PHY_INTERFACE_MODE_QSGMII:
case PHY_INTERFACE_MODE_QUSGMII:
return 4;
+ case PHY_INTERFACE_MODE_PSGMII:
+ return 5;
case PHY_INTERFACE_MODE_MAX:
WARN_ONCE(1, "PHY_INTERFACE_MODE_MAX isn't a valid interface mode");
return 0;
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -187,6 +187,7 @@ static int phylink_interface_max_speed(p
case PHY_INTERFACE_MODE_RGMII_RXID:
case PHY_INTERFACE_MODE_RGMII_ID:
case PHY_INTERFACE_MODE_RGMII:
+ case PHY_INTERFACE_MODE_PSGMII:
case PHY_INTERFACE_MODE_QSGMII:
case PHY_INTERFACE_MODE_QUSGMII:
case PHY_INTERFACE_MODE_SGMII:
@@ -448,6 +449,7 @@ unsigned long phylink_get_capabilities(p
case PHY_INTERFACE_MODE_RGMII_RXID:
case PHY_INTERFACE_MODE_RGMII_ID:
case PHY_INTERFACE_MODE_RGMII:
+ case PHY_INTERFACE_MODE_PSGMII:
case PHY_INTERFACE_MODE_QSGMII:
case PHY_INTERFACE_MODE_QUSGMII:
case PHY_INTERFACE_MODE_SGMII:
@@ -814,6 +816,7 @@ static int phylink_parse_mode(struct phy
switch (pl->link_config.interface) {
case PHY_INTERFACE_MODE_SGMII:
+ case PHY_INTERFACE_MODE_PSGMII:
case PHY_INTERFACE_MODE_QSGMII:
case PHY_INTERFACE_MODE_QUSGMII:
case PHY_INTERFACE_MODE_RGMII:
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -103,6 +103,7 @@ extern const int phy_10gbit_features_arr
* @PHY_INTERFACE_MODE_XGMII: 10 gigabit media-independent interface
* @PHY_INTERFACE_MODE_XLGMII:40 gigabit media-independent interface
* @PHY_INTERFACE_MODE_MOCA: Multimedia over Coax
+ * @PHY_INTERFACE_MODE_PSGMII: Penta SGMII
* @PHY_INTERFACE_MODE_QSGMII: Quad SGMII
* @PHY_INTERFACE_MODE_TRGMII: Turbo RGMII
* @PHY_INTERFACE_MODE_100BASEX: 100 BaseX
@@ -140,6 +141,7 @@ typedef enum {
PHY_INTERFACE_MODE_XGMII,
PHY_INTERFACE_MODE_XLGMII,
PHY_INTERFACE_MODE_MOCA,
+ PHY_INTERFACE_MODE_PSGMII,
PHY_INTERFACE_MODE_QSGMII,
PHY_INTERFACE_MODE_TRGMII,
PHY_INTERFACE_MODE_100BASEX,
@@ -247,6 +249,8 @@ static inline const char *phy_modes(phy_
return "xlgmii";
case PHY_INTERFACE_MODE_MOCA:
return "moca";
+ case PHY_INTERFACE_MODE_PSGMII:
+ return "psgmii";
case PHY_INTERFACE_MODE_QSGMII:
return "qsgmii";
case PHY_INTERFACE_MODE_TRGMII:

View File

@ -1,32 +0,0 @@
From a593a2fcfdfb92cfd0ffc54bc81b07e6bfaaaf46 Mon Sep 17 00:00:00 2001
From: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Date: Thu, 16 Mar 2023 14:08:26 +0200
Subject: [PATCH] net: phy: at803x: Replace of_gpio.h with what indeed is used
of_gpio.h in this driver is solely used as a proxy to other headers.
This is incorrect usage of the of_gpio.h. Replace it .h with what
indeed is used in the code.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/net/phy/at803x.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
--- a/drivers/net/phy/at803x.c
+++ b/drivers/net/phy/at803x.c
@@ -13,12 +13,11 @@
#include <linux/netdevice.h>
#include <linux/etherdevice.h>
#include <linux/ethtool_netlink.h>
-#include <linux/of_gpio.h>
#include <linux/bitfield.h>
-#include <linux/gpio/consumer.h>
#include <linux/regulator/of_regulator.h>
#include <linux/regulator/driver.h>
#include <linux/regulator/consumer.h>
+#include <linux/of.h>
#include <linux/phylink.h>
#include <linux/sfp.h>
#include <dt-bindings/net/qca-ar803x.h>

View File

@ -1,70 +0,0 @@
From 8b8bc13d89a7d23d14b0e041c73f037c9db997b1 Mon Sep 17 00:00:00 2001
From: Luo Jie <quic_luoj@quicinc.com>
Date: Sun, 16 Jul 2023 16:49:19 +0800
Subject: [PATCH 1/6] net: phy: at803x: support qca8081
genphy_c45_pma_read_abilities
qca8081 PHY supports to use genphy_c45_pma_read_abilities for
getting the PHY features supported except for the autoneg ability
but autoneg ability exists in MDIO_STAT1 instead of MMD7.1, add it
manually after calling genphy_c45_pma_read_abilities.
Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/net/phy/at803x.c | 28 ++++++++++++++++++----------
1 file changed, 18 insertions(+), 10 deletions(-)
--- a/drivers/net/phy/at803x.c
+++ b/drivers/net/phy/at803x.c
@@ -902,15 +902,6 @@ static int at803x_get_features(struct ph
if (err)
return err;
- if (phydev->drv->phy_id == QCA8081_PHY_ID) {
- err = phy_read_mmd(phydev, MDIO_MMD_PMAPMD, MDIO_PMA_NG_EXTABLE);
- if (err < 0)
- return err;
-
- linkmode_mod_bit(ETHTOOL_LINK_MODE_2500baseT_Full_BIT, phydev->supported,
- err & MDIO_PMA_NG_EXTABLE_2_5GBT);
- }
-
if (phydev->drv->phy_id != ATH8031_PHY_ID)
return 0;
@@ -1996,6 +1987,23 @@ static int qca808x_cable_test_get_status
return 0;
}
+static int qca808x_get_features(struct phy_device *phydev)
+{
+ int ret;
+
+ ret = genphy_c45_pma_read_abilities(phydev);
+ if (ret)
+ return ret;
+
+ /* The autoneg ability is not existed in bit3 of MMD7.1,
+ * but it is supported by qca808x PHY, so we add it here
+ * manually.
+ */
+ linkmode_set_bit(ETHTOOL_LINK_MODE_Autoneg_BIT, phydev->supported);
+
+ return 0;
+}
+
static struct phy_driver at803x_driver[] = {
{
/* Qualcomm Atheros AR8035 */
@@ -2163,7 +2171,7 @@ static struct phy_driver at803x_driver[]
.set_tunable = at803x_set_tunable,
.set_wol = at803x_set_wol,
.get_wol = at803x_get_wol,
- .get_features = at803x_get_features,
+ .get_features = qca808x_get_features,
.config_aneg = at803x_config_aneg,
.suspend = genphy_suspend,
.resume = genphy_resume,

View File

@ -1,73 +0,0 @@
From f3db55ae860a82e1224a909072783ef850e5d228 Mon Sep 17 00:00:00 2001
From: Luo Jie <quic_luoj@quicinc.com>
Date: Sun, 16 Jul 2023 16:49:20 +0800
Subject: [PATCH 2/6] net: phy: at803x: merge qca8081 slave seed function
merge the seed enablement and seed value configuration into
one function, since the random seed value is needed to be
configured when the seed is enabled.
Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/net/phy/at803x.c | 29 +++++++++--------------------
1 file changed, 9 insertions(+), 20 deletions(-)
--- a/drivers/net/phy/at803x.c
+++ b/drivers/net/phy/at803x.c
@@ -1730,24 +1730,19 @@ static int qca808x_phy_fast_retrain_conf
return 0;
}
-static int qca808x_phy_ms_random_seed_set(struct phy_device *phydev)
-{
- u16 seed_value = prandom_u32_max(QCA808X_MASTER_SLAVE_SEED_RANGE);
-
- return at803x_debug_reg_mask(phydev, QCA808X_PHY_DEBUG_LOCAL_SEED,
- QCA808X_MASTER_SLAVE_SEED_CFG,
- FIELD_PREP(QCA808X_MASTER_SLAVE_SEED_CFG, seed_value));
-}
-
static int qca808x_phy_ms_seed_enable(struct phy_device *phydev, bool enable)
{
- u16 seed_enable = 0;
+ u16 seed_value;
- if (enable)
- seed_enable = QCA808X_MASTER_SLAVE_SEED_ENABLE;
+ if (!enable)
+ return at803x_debug_reg_mask(phydev, QCA808X_PHY_DEBUG_LOCAL_SEED,
+ QCA808X_MASTER_SLAVE_SEED_ENABLE, 0);
+ seed_value = prandom_u32_max(QCA808X_MASTER_SLAVE_SEED_RANGE);
return at803x_debug_reg_mask(phydev, QCA808X_PHY_DEBUG_LOCAL_SEED,
- QCA808X_MASTER_SLAVE_SEED_ENABLE, seed_enable);
+ QCA808X_MASTER_SLAVE_SEED_CFG | QCA808X_MASTER_SLAVE_SEED_ENABLE,
+ FIELD_PREP(QCA808X_MASTER_SLAVE_SEED_CFG, seed_value) |
+ QCA808X_MASTER_SLAVE_SEED_ENABLE);
}
static int qca808x_config_init(struct phy_device *phydev)
@@ -1771,12 +1766,7 @@ static int qca808x_config_init(struct ph
if (ret)
return ret;
- /* Configure lower ramdom seed to make phy linked as slave mode */
- ret = qca808x_phy_ms_random_seed_set(phydev);
- if (ret)
- return ret;
-
- /* Enable seed */
+ /* Enable seed and configure lower ramdom seed to make phy linked as slave mode */
ret = qca808x_phy_ms_seed_enable(phydev, true);
if (ret)
return ret;
@@ -1821,7 +1811,6 @@ static int qca808x_read_status(struct ph
if (phydev->master_slave_state == MASTER_SLAVE_STATE_ERR) {
qca808x_phy_ms_seed_enable(phydev, false);
} else {
- qca808x_phy_ms_random_seed_set(phydev);
qca808x_phy_ms_seed_enable(phydev, true);
}
}

View File

@ -1,76 +0,0 @@
From 7cc3209558002d95c0d45a1276ba4f5f741eec42 Mon Sep 17 00:00:00 2001
From: Luo Jie <quic_luoj@quicinc.com>
Date: Sun, 16 Jul 2023 16:49:21 +0800
Subject: [PATCH 3/6] net: phy: at803x: enable qca8081 slave seed conditionally
qca8081 is the single port PHY, the slave prefer mode is used
by default.
if the phy master perfer mode is configured, the slave seed
configuration should not be enabled, since the slave seed
enablement is for making PHY linked as slave mode easily.
disable slave seed if the master mode is preferred.
Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/net/phy/at803x.c | 25 ++++++++++++++++++++-----
1 file changed, 20 insertions(+), 5 deletions(-)
--- a/drivers/net/phy/at803x.c
+++ b/drivers/net/phy/at803x.c
@@ -1745,6 +1745,12 @@ static int qca808x_phy_ms_seed_enable(st
QCA808X_MASTER_SLAVE_SEED_ENABLE);
}
+static bool qca808x_is_prefer_master(struct phy_device *phydev)
+{
+ return (phydev->master_slave_get == MASTER_SLAVE_CFG_MASTER_FORCE) ||
+ (phydev->master_slave_get == MASTER_SLAVE_CFG_MASTER_PREFERRED);
+}
+
static int qca808x_config_init(struct phy_device *phydev)
{
int ret;
@@ -1766,11 +1772,17 @@ static int qca808x_config_init(struct ph
if (ret)
return ret;
- /* Enable seed and configure lower ramdom seed to make phy linked as slave mode */
- ret = qca808x_phy_ms_seed_enable(phydev, true);
- if (ret)
+ ret = genphy_read_master_slave(phydev);
+ if (ret < 0)
return ret;
+ if (!qca808x_is_prefer_master(phydev)) {
+ /* Enable seed and configure lower ramdom seed to make phy linked as slave mode */
+ ret = qca808x_phy_ms_seed_enable(phydev, true);
+ if (ret)
+ return ret;
+ }
+
/* Configure adc threshold as 100mv for the link 10M */
return at803x_debug_reg_mask(phydev, QCA808X_PHY_DEBUG_ADC_THRESHOLD,
QCA808X_ADC_THRESHOLD_MASK, QCA808X_ADC_THRESHOLD_100MV);
@@ -1802,13 +1814,16 @@ static int qca808x_read_status(struct ph
phydev->interface = PHY_INTERFACE_MODE_SGMII;
} else {
/* generate seed as a lower random value to make PHY linked as SLAVE easily,
- * except for master/slave configuration fault detected.
+ * except for master/slave configuration fault detected or the master mode
+ * preferred.
+ *
* the reason for not putting this code into the function link_change_notify is
* the corner case where the link partner is also the qca8081 PHY and the seed
* value is configured as the same value, the link can't be up and no link change
* occurs.
*/
- if (phydev->master_slave_state == MASTER_SLAVE_STATE_ERR) {
+ if (phydev->master_slave_state == MASTER_SLAVE_STATE_ERR ||
+ qca808x_is_prefer_master(phydev)) {
qca808x_phy_ms_seed_enable(phydev, false);
} else {
qca808x_phy_ms_seed_enable(phydev, true);

View File

@ -1,48 +0,0 @@
From fea7cfb83d1a2782e39cd101dd44ed2548539de5 Mon Sep 17 00:00:00 2001
From: Luo Jie <quic_luoj@quicinc.com>
Date: Sun, 16 Jul 2023 16:49:22 +0800
Subject: [PATCH 4/6] net: phy: at803x: support qca8081 1G chip type
The qca8081 1G chip version does not support 2.5 capability, which
is distinguished from qca8081 2.5G chip according to the bit0 of
register mmd7.0x901d, the 1G version chip also has the same PHY ID
as the normal qca8081 2.5G chip.
Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/net/phy/at803x.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
--- a/drivers/net/phy/at803x.c
+++ b/drivers/net/phy/at803x.c
@@ -272,6 +272,10 @@
#define QCA808X_CDT_STATUS_STAT_OPEN 2
#define QCA808X_CDT_STATUS_STAT_SHORT 3
+/* QCA808X 1G chip type */
+#define QCA808X_PHY_MMD7_CHIP_TYPE 0x901d
+#define QCA808X_PHY_CHIP_TYPE_1G BIT(0)
+
MODULE_DESCRIPTION("Qualcomm Atheros AR803x and QCA808X PHY driver");
MODULE_AUTHOR("Matus Ujhelyi");
MODULE_LICENSE("GPL");
@@ -2005,6 +2009,17 @@ static int qca808x_get_features(struct p
*/
linkmode_set_bit(ETHTOOL_LINK_MODE_Autoneg_BIT, phydev->supported);
+ /* As for the qca8081 1G version chip, the 2500baseT ability is also
+ * existed in the bit0 of MMD1.21, we need to remove it manually if
+ * it is the qca8081 1G chip according to the bit0 of MMD7.0x901d.
+ */
+ ret = phy_read_mmd(phydev, MDIO_MMD_AN, QCA808X_PHY_MMD7_CHIP_TYPE);
+ if (ret < 0)
+ return ret;
+
+ if (QCA808X_PHY_CHIP_TYPE_1G & ret)
+ linkmode_clear_bit(ETHTOOL_LINK_MODE_2500baseT_Full_BIT, phydev->supported);
+
return 0;
}

View File

@ -1,98 +0,0 @@
From df9401ff3e6eeaa42bfb06761967f1b71f5afce7 Mon Sep 17 00:00:00 2001
From: Luo Jie <quic_luoj@quicinc.com>
Date: Sun, 16 Jul 2023 16:49:23 +0800
Subject: [PATCH 5/6] net: phy: at803x: remove qca8081 1G fast retrain and
slave seed config
The fast retrain and slave seed configs are only applicable when the 2.5G
ability is supported.
Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/net/phy/at803x.c | 50 +++++++++++++++++++++++++---------------
1 file changed, 32 insertions(+), 18 deletions(-)
--- a/drivers/net/phy/at803x.c
+++ b/drivers/net/phy/at803x.c
@@ -1755,6 +1755,11 @@ static bool qca808x_is_prefer_master(str
(phydev->master_slave_get == MASTER_SLAVE_CFG_MASTER_PREFERRED);
}
+static bool qca808x_has_fast_retrain_or_slave_seed(struct phy_device *phydev)
+{
+ return linkmode_test_bit(ETHTOOL_LINK_MODE_2500baseT_Full_BIT, phydev->supported);
+}
+
static int qca808x_config_init(struct phy_device *phydev)
{
int ret;
@@ -1771,20 +1776,24 @@ static int qca808x_config_init(struct ph
if (ret)
return ret;
- /* Config the fast retrain for the link 2500M */
- ret = qca808x_phy_fast_retrain_config(phydev);
- if (ret)
- return ret;
-
- ret = genphy_read_master_slave(phydev);
- if (ret < 0)
- return ret;
-
- if (!qca808x_is_prefer_master(phydev)) {
- /* Enable seed and configure lower ramdom seed to make phy linked as slave mode */
- ret = qca808x_phy_ms_seed_enable(phydev, true);
+ if (qca808x_has_fast_retrain_or_slave_seed(phydev)) {
+ /* Config the fast retrain for the link 2500M */
+ ret = qca808x_phy_fast_retrain_config(phydev);
if (ret)
return ret;
+
+ ret = genphy_read_master_slave(phydev);
+ if (ret < 0)
+ return ret;
+
+ if (!qca808x_is_prefer_master(phydev)) {
+ /* Enable seed and configure lower ramdom seed to make phy
+ * linked as slave mode.
+ */
+ ret = qca808x_phy_ms_seed_enable(phydev, true);
+ if (ret)
+ return ret;
+ }
}
/* Configure adc threshold as 100mv for the link 10M */
@@ -1826,11 +1835,13 @@ static int qca808x_read_status(struct ph
* value is configured as the same value, the link can't be up and no link change
* occurs.
*/
- if (phydev->master_slave_state == MASTER_SLAVE_STATE_ERR ||
- qca808x_is_prefer_master(phydev)) {
- qca808x_phy_ms_seed_enable(phydev, false);
- } else {
- qca808x_phy_ms_seed_enable(phydev, true);
+ if (qca808x_has_fast_retrain_or_slave_seed(phydev)) {
+ if (phydev->master_slave_state == MASTER_SLAVE_STATE_ERR ||
+ qca808x_is_prefer_master(phydev)) {
+ qca808x_phy_ms_seed_enable(phydev, false);
+ } else {
+ qca808x_phy_ms_seed_enable(phydev, true);
+ }
}
}
@@ -1845,7 +1856,10 @@ static int qca808x_soft_reset(struct phy
if (ret < 0)
return ret;
- return qca808x_phy_ms_seed_enable(phydev, true);
+ if (qca808x_has_fast_retrain_or_slave_seed(phydev))
+ ret = qca808x_phy_ms_seed_enable(phydev, true);
+
+ return ret;
}
static bool qca808x_cdt_fault_length_valid(int cdt_code)

View File

@ -1,55 +0,0 @@
From 723970affdd8766fa0d91cd34bf2ffc861538b5f Mon Sep 17 00:00:00 2001
From: Luo Jie <quic_luoj@quicinc.com>
Date: Sun, 16 Jul 2023 16:49:24 +0800
Subject: [PATCH 6/6] net: phy: at803x: add qca8081 fifo reset on the link
changed
The qca8081 sgmii fifo needs to be reset on link down and
released on the link up in case of any abnormal issue
such as the packet blocked on the PHY.
Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/net/phy/at803x.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
--- a/drivers/net/phy/at803x.c
+++ b/drivers/net/phy/at803x.c
@@ -276,6 +276,9 @@
#define QCA808X_PHY_MMD7_CHIP_TYPE 0x901d
#define QCA808X_PHY_CHIP_TYPE_1G BIT(0)
+#define QCA8081_PHY_SERDES_MMD1_FIFO_CTRL 0x9072
+#define QCA8081_PHY_FIFO_RSTN BIT(11)
+
MODULE_DESCRIPTION("Qualcomm Atheros AR803x and QCA808X PHY driver");
MODULE_AUTHOR("Matus Ujhelyi");
MODULE_LICENSE("GPL");
@@ -2037,6 +2040,16 @@ static int qca808x_get_features(struct p
return 0;
}
+static void qca808x_link_change_notify(struct phy_device *phydev)
+{
+ /* Assert interface sgmii fifo on link down, deassert it on link up,
+ * the interface device address is always phy address added by 1.
+ */
+ mdiobus_c45_modify_changed(phydev->mdio.bus, phydev->mdio.addr + 1,
+ MDIO_MMD_PMAPMD, QCA8081_PHY_SERDES_MMD1_FIFO_CTRL,
+ QCA8081_PHY_FIFO_RSTN, phydev->link ? QCA8081_PHY_FIFO_RSTN : 0);
+}
+
static struct phy_driver at803x_driver[] = {
{
/* Qualcomm Atheros AR8035 */
@@ -2213,6 +2226,7 @@ static struct phy_driver at803x_driver[]
.soft_reset = qca808x_soft_reset,
.cable_test_start = qca808x_cable_test_start,
.cable_test_get_status = qca808x_cable_test_get_status,
+ .link_change_notify = qca808x_link_change_notify,
}, };
module_phy_driver(at803x_driver);

View File

@ -1,394 +0,0 @@
From 4765a9722e09765866e131ec31f7b9cf4c1f4854 Mon Sep 17 00:00:00 2001
From: Daniel Golle <daniel@makrotopia.org>
Date: Sun, 19 Mar 2023 12:57:50 +0000
Subject: [PATCH] net: pcs: add driver for MediaTek SGMII PCS
The SGMII core found in several MediaTek SoCs is identical to what can
also be found in MediaTek's MT7531 Ethernet switch IC.
As this has not always been clear, both drivers developed different
implementations to deal with the PCS.
Recently Alexander Couzens pointed out this fact which lead to the
development of this shared driver.
Add a dedicated driver, mostly by copying the code now found in the
Ethernet driver. The now redundant code will be removed by a follow-up
commit.
Suggested-by: Alexander Couzens <lynxis@fe80.eu>
Suggested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Tested-by: Frank Wunderlich <frank-w@public-files.de>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
MAINTAINERS | 8 +
drivers/net/pcs/Kconfig | 7 +
drivers/net/pcs/Makefile | 1 +
drivers/net/pcs/pcs-mtk-lynxi.c | 305 ++++++++++++++++++++++++++++++
include/linux/pcs/pcs-mtk-lynxi.h | 13 ++
5 files changed, 334 insertions(+)
create mode 100644 drivers/net/pcs/pcs-mtk-lynxi.c
create mode 100644 include/linux/pcs/pcs-mtk-lynxi.h
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12928,6 +12928,14 @@ L: netdev@vger.kernel.org
S: Maintained
F: drivers/net/ethernet/mediatek/
+MEDIATEK ETHERNET PCS DRIVER
+M: Alexander Couzens <lynxis@fe80.eu>
+M: Daniel Golle <daniel@makrotopia.org>
+L: netdev@vger.kernel.org
+S: Maintained
+F: drivers/net/pcs/pcs-mtk-lynxi.c
+F: include/linux/pcs/pcs-mtk-lynxi.h
+
MEDIATEK I2C CONTROLLER DRIVER
M: Qii Wang <qii.wang@mediatek.com>
L: linux-i2c@vger.kernel.org
--- a/drivers/net/pcs/Kconfig
+++ b/drivers/net/pcs/Kconfig
@@ -32,4 +32,11 @@ config PCS_ALTERA_TSE
This module provides helper functions for the Altera Triple Speed
Ethernet SGMII PCS, that can be found on the Intel Socfpga family.
+config PCS_MTK_LYNXI
+ tristate
+ select REGMAP
+ help
+ This module provides helpers to phylink for managing the LynxI PCS
+ which is part of MediaTek's SoC and Ethernet switch ICs.
+
endmenu
--- a/drivers/net/pcs/Makefile
+++ b/drivers/net/pcs/Makefile
@@ -7,3 +7,4 @@ obj-$(CONFIG_PCS_XPCS) += pcs_xpcs.o
obj-$(CONFIG_PCS_LYNX) += pcs-lynx.o
obj-$(CONFIG_PCS_RZN1_MIIC) += pcs-rzn1-miic.o
obj-$(CONFIG_PCS_ALTERA_TSE) += pcs-altera-tse.o
+obj-$(CONFIG_PCS_MTK_LYNXI) += pcs-mtk-lynxi.o
--- /dev/null
+++ b/drivers/net/pcs/pcs-mtk-lynxi.c
@@ -0,0 +1,305 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2018-2019 MediaTek Inc.
+/* A library for MediaTek SGMII circuit
+ *
+ * Author: Sean Wang <sean.wang@mediatek.com>
+ * Author: Alexander Couzens <lynxis@fe80.eu>
+ * Author: Daniel Golle <daniel@makrotopia.org>
+ *
+ */
+
+#include <linux/mdio.h>
+#include <linux/of.h>
+#include <linux/pcs/pcs-mtk-lynxi.h>
+#include <linux/phylink.h>
+#include <linux/regmap.h>
+
+/* SGMII subsystem config registers */
+/* BMCR (low 16) BMSR (high 16) */
+#define SGMSYS_PCS_CONTROL_1 0x0
+#define SGMII_BMCR GENMASK(15, 0)
+#define SGMII_BMSR GENMASK(31, 16)
+
+#define SGMSYS_PCS_DEVICE_ID 0x4
+#define SGMII_LYNXI_DEV_ID 0x4d544950
+
+#define SGMSYS_PCS_ADVERTISE 0x8
+#define SGMII_ADVERTISE GENMASK(15, 0)
+#define SGMII_LPA GENMASK(31, 16)
+
+#define SGMSYS_PCS_SCRATCH 0x14
+#define SGMII_DEV_VERSION GENMASK(31, 16)
+
+/* Register to programmable link timer, the unit in 2 * 8ns */
+#define SGMSYS_PCS_LINK_TIMER 0x18
+#define SGMII_LINK_TIMER_MASK GENMASK(19, 0)
+#define SGMII_LINK_TIMER_VAL(ns) FIELD_PREP(SGMII_LINK_TIMER_MASK, \
+ ((ns) / 2 / 8))
+
+/* Register to control remote fault */
+#define SGMSYS_SGMII_MODE 0x20
+#define SGMII_IF_MODE_SGMII BIT(0)
+#define SGMII_SPEED_DUPLEX_AN BIT(1)
+#define SGMII_SPEED_MASK GENMASK(3, 2)
+#define SGMII_SPEED_10 FIELD_PREP(SGMII_SPEED_MASK, 0)
+#define SGMII_SPEED_100 FIELD_PREP(SGMII_SPEED_MASK, 1)
+#define SGMII_SPEED_1000 FIELD_PREP(SGMII_SPEED_MASK, 2)
+#define SGMII_DUPLEX_HALF BIT(4)
+#define SGMII_REMOTE_FAULT_DIS BIT(8)
+
+/* Register to reset SGMII design */
+#define SGMSYS_RESERVED_0 0x34
+#define SGMII_SW_RESET BIT(0)
+
+/* Register to set SGMII speed, ANA RG_ Control Signals III */
+#define SGMII_PHY_SPEED_MASK GENMASK(3, 2)
+#define SGMII_PHY_SPEED_1_25G FIELD_PREP(SGMII_PHY_SPEED_MASK, 0)
+#define SGMII_PHY_SPEED_3_125G FIELD_PREP(SGMII_PHY_SPEED_MASK, 1)
+
+/* Register to power up QPHY */
+#define SGMSYS_QPHY_PWR_STATE_CTRL 0xe8
+#define SGMII_PHYA_PWD BIT(4)
+
+/* Register to QPHY wrapper control */
+#define SGMSYS_QPHY_WRAP_CTRL 0xec
+#define SGMII_PN_SWAP_MASK GENMASK(1, 0)
+#define SGMII_PN_SWAP_TX_RX (BIT(0) | BIT(1))
+
+/* struct mtk_pcs_lynxi - This structure holds each sgmii regmap andassociated
+ * data
+ * @regmap: The register map pointing at the range used to setup
+ * SGMII modes
+ * @dev: Pointer to device owning the PCS
+ * @ana_rgc3: The offset of register ANA_RGC3 relative to regmap
+ * @interface: Currently configured interface mode
+ * @pcs: Phylink PCS structure
+ * @flags: Flags indicating hardware properties
+ */
+struct mtk_pcs_lynxi {
+ struct regmap *regmap;
+ u32 ana_rgc3;
+ phy_interface_t interface;
+ struct phylink_pcs pcs;
+ u32 flags;
+};
+
+static struct mtk_pcs_lynxi *pcs_to_mtk_pcs_lynxi(struct phylink_pcs *pcs)
+{
+ return container_of(pcs, struct mtk_pcs_lynxi, pcs);
+}
+
+static void mtk_pcs_lynxi_get_state(struct phylink_pcs *pcs,
+ struct phylink_link_state *state)
+{
+ struct mtk_pcs_lynxi *mpcs = pcs_to_mtk_pcs_lynxi(pcs);
+ unsigned int bm, adv;
+
+ /* Read the BMSR and LPA */
+ regmap_read(mpcs->regmap, SGMSYS_PCS_CONTROL_1, &bm);
+ regmap_read(mpcs->regmap, SGMSYS_PCS_ADVERTISE, &adv);
+
+ phylink_mii_c22_pcs_decode_state(state, FIELD_GET(SGMII_BMSR, bm),
+ FIELD_GET(SGMII_LPA, adv));
+}
+
+static int mtk_pcs_lynxi_config(struct phylink_pcs *pcs, unsigned int mode,
+ phy_interface_t interface,
+ const unsigned long *advertising,
+ bool permit_pause_to_mac)
+{
+ struct mtk_pcs_lynxi *mpcs = pcs_to_mtk_pcs_lynxi(pcs);
+ bool mode_changed = false, changed, use_an;
+ unsigned int rgc3, sgm_mode, bmcr;
+ int advertise, link_timer;
+
+ advertise = phylink_mii_c22_pcs_encode_advertisement(interface,
+ advertising);
+ if (advertise < 0)
+ return advertise;
+
+ /* Clearing IF_MODE_BIT0 switches the PCS to BASE-X mode, and
+ * we assume that fixes it's speed at bitrate = line rate (in
+ * other words, 1000Mbps or 2500Mbps).
+ */
+ if (interface == PHY_INTERFACE_MODE_SGMII) {
+ sgm_mode = SGMII_IF_MODE_SGMII;
+ if (phylink_autoneg_inband(mode)) {
+ sgm_mode |= SGMII_REMOTE_FAULT_DIS |
+ SGMII_SPEED_DUPLEX_AN;
+ use_an = true;
+ } else {
+ use_an = false;
+ }
+ } else if (phylink_autoneg_inband(mode)) {
+ /* 1000base-X or 2500base-X autoneg */
+ sgm_mode = SGMII_REMOTE_FAULT_DIS;
+ use_an = linkmode_test_bit(ETHTOOL_LINK_MODE_Autoneg_BIT,
+ advertising);
+ } else {
+ /* 1000base-X or 2500base-X without autoneg */
+ sgm_mode = 0;
+ use_an = false;
+ }
+
+ if (use_an)
+ bmcr = BMCR_ANENABLE;
+ else
+ bmcr = 0;
+
+ if (mpcs->interface != interface) {
+ link_timer = phylink_get_link_timer_ns(interface);
+ if (link_timer < 0)
+ return link_timer;
+
+ /* PHYA power down */
+ regmap_set_bits(mpcs->regmap, SGMSYS_QPHY_PWR_STATE_CTRL,
+ SGMII_PHYA_PWD);
+
+ /* Reset SGMII PCS state */
+ regmap_set_bits(mpcs->regmap, SGMSYS_RESERVED_0,
+ SGMII_SW_RESET);
+
+ if (mpcs->flags & MTK_SGMII_FLAG_PN_SWAP)
+ regmap_update_bits(mpcs->regmap, SGMSYS_QPHY_WRAP_CTRL,
+ SGMII_PN_SWAP_MASK,
+ SGMII_PN_SWAP_TX_RX);
+
+ if (interface == PHY_INTERFACE_MODE_2500BASEX)
+ rgc3 = SGMII_PHY_SPEED_3_125G;
+ else
+ rgc3 = SGMII_PHY_SPEED_1_25G;
+
+ /* Configure the underlying interface speed */
+ regmap_update_bits(mpcs->regmap, mpcs->ana_rgc3,
+ SGMII_PHY_SPEED_MASK, rgc3);
+
+ /* Setup the link timer */
+ regmap_write(mpcs->regmap, SGMSYS_PCS_LINK_TIMER,
+ SGMII_LINK_TIMER_VAL(link_timer));
+
+ mpcs->interface = interface;
+ mode_changed = true;
+ }
+
+ /* Update the advertisement, noting whether it has changed */
+ regmap_update_bits_check(mpcs->regmap, SGMSYS_PCS_ADVERTISE,
+ SGMII_ADVERTISE, advertise, &changed);
+
+ /* Update the sgmsys mode register */
+ regmap_update_bits(mpcs->regmap, SGMSYS_SGMII_MODE,
+ SGMII_REMOTE_FAULT_DIS | SGMII_SPEED_DUPLEX_AN |
+ SGMII_IF_MODE_SGMII, sgm_mode);
+
+ /* Update the BMCR */
+ regmap_update_bits(mpcs->regmap, SGMSYS_PCS_CONTROL_1,
+ BMCR_ANENABLE, bmcr);
+
+ /* Release PHYA power down state
+ * Only removing bit SGMII_PHYA_PWD isn't enough.
+ * There are cases when the SGMII_PHYA_PWD register contains 0x9 which
+ * prevents SGMII from working. The SGMII still shows link but no traffic
+ * can flow. Writing 0x0 to the PHYA_PWD register fix the issue. 0x0 was
+ * taken from a good working state of the SGMII interface.
+ * Unknown how much the QPHY needs but it is racy without a sleep.
+ * Tested on mt7622 & mt7986.
+ */
+ usleep_range(50, 100);
+ regmap_write(mpcs->regmap, SGMSYS_QPHY_PWR_STATE_CTRL, 0);
+
+ return changed || mode_changed;
+}
+
+static void mtk_pcs_lynxi_restart_an(struct phylink_pcs *pcs)
+{
+ struct mtk_pcs_lynxi *mpcs = pcs_to_mtk_pcs_lynxi(pcs);
+
+ regmap_set_bits(mpcs->regmap, SGMSYS_PCS_CONTROL_1, BMCR_ANRESTART);
+}
+
+static void mtk_pcs_lynxi_link_up(struct phylink_pcs *pcs, unsigned int mode,
+ phy_interface_t interface, int speed,
+ int duplex)
+{
+ struct mtk_pcs_lynxi *mpcs = pcs_to_mtk_pcs_lynxi(pcs);
+ unsigned int sgm_mode;
+
+ if (!phylink_autoneg_inband(mode)) {
+ /* Force the speed and duplex setting */
+ if (speed == SPEED_10)
+ sgm_mode = SGMII_SPEED_10;
+ else if (speed == SPEED_100)
+ sgm_mode = SGMII_SPEED_100;
+ else
+ sgm_mode = SGMII_SPEED_1000;
+
+ if (duplex != DUPLEX_FULL)
+ sgm_mode |= SGMII_DUPLEX_HALF;
+
+ regmap_update_bits(mpcs->regmap, SGMSYS_SGMII_MODE,
+ SGMII_DUPLEX_HALF | SGMII_SPEED_MASK,
+ sgm_mode);
+ }
+}
+
+static const struct phylink_pcs_ops mtk_pcs_lynxi_ops = {
+ .pcs_get_state = mtk_pcs_lynxi_get_state,
+ .pcs_config = mtk_pcs_lynxi_config,
+ .pcs_an_restart = mtk_pcs_lynxi_restart_an,
+ .pcs_link_up = mtk_pcs_lynxi_link_up,
+};
+
+struct phylink_pcs *mtk_pcs_lynxi_create(struct device *dev,
+ struct regmap *regmap, u32 ana_rgc3,
+ u32 flags)
+{
+ struct mtk_pcs_lynxi *mpcs;
+ u32 id, ver;
+ int ret;
+
+ ret = regmap_read(regmap, SGMSYS_PCS_DEVICE_ID, &id);
+ if (ret < 0)
+ return NULL;
+
+ if (id != SGMII_LYNXI_DEV_ID) {
+ dev_err(dev, "unknown PCS device id %08x\n", id);
+ return NULL;
+ }
+
+ ret = regmap_read(regmap, SGMSYS_PCS_SCRATCH, &ver);
+ if (ret < 0)
+ return NULL;
+
+ ver = FIELD_GET(SGMII_DEV_VERSION, ver);
+ if (ver != 0x1) {
+ dev_err(dev, "unknown PCS device version %04x\n", ver);
+ return NULL;
+ }
+
+ dev_dbg(dev, "MediaTek LynxI SGMII PCS (id 0x%08x, ver 0x%04x)\n", id,
+ ver);
+
+ mpcs = kzalloc(sizeof(*mpcs), GFP_KERNEL);
+ if (!mpcs)
+ return NULL;
+
+ mpcs->ana_rgc3 = ana_rgc3;
+ mpcs->regmap = regmap;
+ mpcs->flags = flags;
+ mpcs->pcs.ops = &mtk_pcs_lynxi_ops;
+ mpcs->pcs.poll = true;
+ mpcs->interface = PHY_INTERFACE_MODE_NA;
+
+ return &mpcs->pcs;
+}
+EXPORT_SYMBOL(mtk_pcs_lynxi_create);
+
+void mtk_pcs_lynxi_destroy(struct phylink_pcs *pcs)
+{
+ if (!pcs)
+ return;
+
+ kfree(pcs_to_mtk_pcs_lynxi(pcs));
+}
+EXPORT_SYMBOL(mtk_pcs_lynxi_destroy);
+
+MODULE_LICENSE("GPL");
--- /dev/null
+++ b/include/linux/pcs/pcs-mtk-lynxi.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __LINUX_PCS_MTK_LYNXI_H
+#define __LINUX_PCS_MTK_LYNXI_H
+
+#include <linux/phylink.h>
+#include <linux/regmap.h>
+
+#define MTK_SGMII_FLAG_PN_SWAP BIT(0)
+struct phylink_pcs *mtk_pcs_lynxi_create(struct device *dev,
+ struct regmap *regmap,
+ u32 ana_rgc3, u32 flags);
+void mtk_pcs_lynxi_destroy(struct phylink_pcs *pcs);
+#endif

View File

@ -1,103 +0,0 @@
From affa013f494486079c3c5ad2d00cebc41a3d7445 Mon Sep 17 00:00:00 2001
From: Sean Anderson <sean.anderson@seco.com>
Date: Mon, 17 Oct 2022 16:22:36 -0400
Subject: [PATCH 01/21] net: fman: memac: Add serdes support
This adds support for using a serdes which has to be configured. This is
primarly in preparation for phylink conversion, which will then change the
serdes mode dynamically.
Signed-off-by: Sean Anderson <sean.anderson@seco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
.../net/ethernet/freescale/fman/fman_memac.c | 49 ++++++++++++++++++-
1 file changed, 47 insertions(+), 2 deletions(-)
--- a/drivers/net/ethernet/freescale/fman/fman_memac.c
+++ b/drivers/net/ethernet/freescale/fman/fman_memac.c
@@ -13,6 +13,7 @@
#include <linux/io.h>
#include <linux/phy.h>
#include <linux/phy_fixed.h>
+#include <linux/phy/phy.h>
#include <linux/of_mdio.h>
/* PCS registers */
@@ -324,6 +325,7 @@ struct fman_mac {
void *fm;
struct fman_rev_info fm_rev_info;
bool basex_if;
+ struct phy *serdes;
struct phy_device *pcsphy;
bool allmulti_enabled;
};
@@ -1203,17 +1205,56 @@ int memac_initialization(struct mac_devi
}
}
+ memac->serdes = devm_of_phy_get(mac_dev->dev, mac_node, "serdes");
+ err = PTR_ERR(memac->serdes);
+ if (err == -ENODEV || err == -ENOSYS) {
+ dev_dbg(mac_dev->dev, "could not get (optional) serdes\n");
+ memac->serdes = NULL;
+ } else if (IS_ERR(memac->serdes)) {
+ dev_err_probe(mac_dev->dev, err, "could not get serdes\n");
+ goto _return_fm_mac_free;
+ } else {
+ err = phy_init(memac->serdes);
+ if (err) {
+ dev_err_probe(mac_dev->dev, err,
+ "could not initialize serdes\n");
+ goto _return_fm_mac_free;
+ }
+
+ err = phy_power_on(memac->serdes);
+ if (err) {
+ dev_err_probe(mac_dev->dev, err,
+ "could not power on serdes\n");
+ goto _return_phy_exit;
+ }
+
+ if (memac->phy_if == PHY_INTERFACE_MODE_SGMII ||
+ memac->phy_if == PHY_INTERFACE_MODE_1000BASEX ||
+ memac->phy_if == PHY_INTERFACE_MODE_2500BASEX ||
+ memac->phy_if == PHY_INTERFACE_MODE_QSGMII ||
+ memac->phy_if == PHY_INTERFACE_MODE_XGMII) {
+ err = phy_set_mode_ext(memac->serdes, PHY_MODE_ETHERNET,
+ memac->phy_if);
+ if (err) {
+ dev_err_probe(mac_dev->dev, err,
+ "could not set serdes mode to %s\n",
+ phy_modes(memac->phy_if));
+ goto _return_phy_power_off;
+ }
+ }
+ }
+
if (!mac_dev->phy_node && of_phy_is_fixed_link(mac_node)) {
struct phy_device *phy;
err = of_phy_register_fixed_link(mac_node);
if (err)
- goto _return_fm_mac_free;
+ goto _return_phy_power_off;
fixed_link = kzalloc(sizeof(*fixed_link), GFP_KERNEL);
if (!fixed_link) {
err = -ENOMEM;
- goto _return_fm_mac_free;
+ goto _return_phy_power_off;
}
mac_dev->phy_node = of_node_get(mac_node);
@@ -1242,6 +1283,10 @@ int memac_initialization(struct mac_devi
goto _return;
+_return_phy_power_off:
+ phy_power_off(memac->serdes);
+_return_phy_exit:
+ phy_exit(memac->serdes);
_return_fixed_link_free:
kfree(fixed_link);
_return_fm_mac_free:

View File

@ -1,384 +0,0 @@
From fe60e7154d3a35af975c5e6570d6ec31aab9a731 Mon Sep 17 00:00:00 2001
From: Sean Anderson <sean.anderson@seco.com>
Date: Mon, 17 Oct 2022 16:22:37 -0400
Subject: [PATCH 02/21] net: fman: memac: Use lynx pcs driver
Although not stated in the datasheet, as far as I can tell PCS for mEMACs
is a "Lynx." By reusing the existing driver, we can remove the PCS
management code from the memac driver. This requires calling some PCS
functions manually which phylink would usually do for us, but we will let
it do that soon.
One problem is that we don't actually have a PCS for QSGMII. We pretend
that each mEMAC's MDIO bus has four QSGMII PCSs, but this is not the case.
Only the "base" mEMAC's MDIO bus has the four QSGMII PCSs. This is not an
issue yet, because we never get the PCS state. However, it will be once the
conversion to phylink is complete, since the links will appear to never
come up. To get around this, we allow specifying multiple PCSs in pcsphy.
This breaks backwards compatibility with old device trees, but only for
QSGMII. IMO this is the only reasonable way to figure out what the actual
QSGMII PCS is.
Additionally, we now also support a separate XFI PCS. This can allow the
SerDes driver to set different addresses for the SGMII and XFI PCSs so they
can be accessed at the same time.
Signed-off-by: Sean Anderson <sean.anderson@seco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/net/ethernet/freescale/fman/Kconfig | 3 +
.../net/ethernet/freescale/fman/fman_memac.c | 258 +++++++-----------
2 files changed, 105 insertions(+), 156 deletions(-)
--- a/drivers/net/ethernet/freescale/fman/Kconfig
+++ b/drivers/net/ethernet/freescale/fman/Kconfig
@@ -4,6 +4,9 @@ config FSL_FMAN
depends on FSL_SOC || ARCH_LAYERSCAPE || COMPILE_TEST
select GENERIC_ALLOCATOR
select PHYLIB
+ select PHYLINK
+ select PCS
+ select PCS_LYNX
select CRC32
default n
help
--- a/drivers/net/ethernet/freescale/fman/fman_memac.c
+++ b/drivers/net/ethernet/freescale/fman/fman_memac.c
@@ -11,43 +11,12 @@
#include <linux/slab.h>
#include <linux/io.h>
+#include <linux/pcs-lynx.h>
#include <linux/phy.h>
#include <linux/phy_fixed.h>
#include <linux/phy/phy.h>
#include <linux/of_mdio.h>
-/* PCS registers */
-#define MDIO_SGMII_CR 0x00
-#define MDIO_SGMII_DEV_ABIL_SGMII 0x04
-#define MDIO_SGMII_LINK_TMR_L 0x12
-#define MDIO_SGMII_LINK_TMR_H 0x13
-#define MDIO_SGMII_IF_MODE 0x14
-
-/* SGMII Control defines */
-#define SGMII_CR_AN_EN 0x1000
-#define SGMII_CR_RESTART_AN 0x0200
-#define SGMII_CR_FD 0x0100
-#define SGMII_CR_SPEED_SEL1_1G 0x0040
-#define SGMII_CR_DEF_VAL (SGMII_CR_AN_EN | SGMII_CR_FD | \
- SGMII_CR_SPEED_SEL1_1G)
-
-/* SGMII Device Ability for SGMII defines */
-#define MDIO_SGMII_DEV_ABIL_SGMII_MODE 0x4001
-#define MDIO_SGMII_DEV_ABIL_BASEX_MODE 0x01A0
-
-/* Link timer define */
-#define LINK_TMR_L 0xa120
-#define LINK_TMR_H 0x0007
-#define LINK_TMR_L_BASEX 0xaf08
-#define LINK_TMR_H_BASEX 0x002f
-
-/* SGMII IF Mode defines */
-#define IF_MODE_USE_SGMII_AN 0x0002
-#define IF_MODE_SGMII_EN 0x0001
-#define IF_MODE_SGMII_SPEED_100M 0x0004
-#define IF_MODE_SGMII_SPEED_1G 0x0008
-#define IF_MODE_SGMII_DUPLEX_HALF 0x0010
-
/* Num of additional exact match MAC adr regs */
#define MEMAC_NUM_OF_PADDRS 7
@@ -326,7 +295,9 @@ struct fman_mac {
struct fman_rev_info fm_rev_info;
bool basex_if;
struct phy *serdes;
- struct phy_device *pcsphy;
+ struct phylink_pcs *sgmii_pcs;
+ struct phylink_pcs *qsgmii_pcs;
+ struct phylink_pcs *xfi_pcs;
bool allmulti_enabled;
};
@@ -487,91 +458,22 @@ static u32 get_mac_addr_hash_code(u64 et
return xor_val;
}
-static void setup_sgmii_internal_phy(struct fman_mac *memac,
- struct fixed_phy_status *fixed_link)
-{
- u16 tmp_reg16;
-
- if (WARN_ON(!memac->pcsphy))
- return;
-
- /* SGMII mode */
- tmp_reg16 = IF_MODE_SGMII_EN;
- if (!fixed_link)
- /* AN enable */
- tmp_reg16 |= IF_MODE_USE_SGMII_AN;
- else {
- switch (fixed_link->speed) {
- case 10:
- /* For 10M: IF_MODE[SPEED_10M] = 0 */
- break;
- case 100:
- tmp_reg16 |= IF_MODE_SGMII_SPEED_100M;
- break;
- case 1000:
- default:
- tmp_reg16 |= IF_MODE_SGMII_SPEED_1G;
- break;
- }
- if (!fixed_link->duplex)
- tmp_reg16 |= IF_MODE_SGMII_DUPLEX_HALF;
- }
- phy_write(memac->pcsphy, MDIO_SGMII_IF_MODE, tmp_reg16);
-
- /* Device ability according to SGMII specification */
- tmp_reg16 = MDIO_SGMII_DEV_ABIL_SGMII_MODE;
- phy_write(memac->pcsphy, MDIO_SGMII_DEV_ABIL_SGMII, tmp_reg16);
-
- /* Adjust link timer for SGMII -
- * According to Cisco SGMII specification the timer should be 1.6 ms.
- * The link_timer register is configured in units of the clock.
- * - When running as 1G SGMII, Serdes clock is 125 MHz, so
- * unit = 1 / (125*10^6 Hz) = 8 ns.
- * 1.6 ms in units of 8 ns = 1.6ms / 8ns = 2*10^5 = 0x30d40
- * - When running as 2.5G SGMII, Serdes clock is 312.5 MHz, so
- * unit = 1 / (312.5*10^6 Hz) = 3.2 ns.
- * 1.6 ms in units of 3.2 ns = 1.6ms / 3.2ns = 5*10^5 = 0x7a120.
- * Since link_timer value of 1G SGMII will be too short for 2.5 SGMII,
- * we always set up here a value of 2.5 SGMII.
- */
- phy_write(memac->pcsphy, MDIO_SGMII_LINK_TMR_H, LINK_TMR_H);
- phy_write(memac->pcsphy, MDIO_SGMII_LINK_TMR_L, LINK_TMR_L);
-
- if (!fixed_link)
- /* Restart AN */
- tmp_reg16 = SGMII_CR_DEF_VAL | SGMII_CR_RESTART_AN;
+static void setup_sgmii_internal(struct fman_mac *memac,
+ struct phylink_pcs *pcs,
+ struct fixed_phy_status *fixed_link)
+{
+ __ETHTOOL_DECLARE_LINK_MODE_MASK(advertising);
+ phy_interface_t iface = memac->basex_if ? PHY_INTERFACE_MODE_1000BASEX :
+ PHY_INTERFACE_MODE_SGMII;
+ unsigned int mode = fixed_link ? MLO_AN_FIXED : MLO_AN_INBAND;
+
+ linkmode_set_pause(advertising, true, true);
+ pcs->ops->pcs_config(pcs, mode, iface, advertising, true);
+ if (fixed_link)
+ pcs->ops->pcs_link_up(pcs, mode, iface, fixed_link->speed,
+ fixed_link->duplex);
else
- /* AN disabled */
- tmp_reg16 = SGMII_CR_DEF_VAL & ~SGMII_CR_AN_EN;
- phy_write(memac->pcsphy, 0x0, tmp_reg16);
-}
-
-static void setup_sgmii_internal_phy_base_x(struct fman_mac *memac)
-{
- u16 tmp_reg16;
-
- /* AN Device capability */
- tmp_reg16 = MDIO_SGMII_DEV_ABIL_BASEX_MODE;
- phy_write(memac->pcsphy, MDIO_SGMII_DEV_ABIL_SGMII, tmp_reg16);
-
- /* Adjust link timer for SGMII -
- * For Serdes 1000BaseX auto-negotiation the timer should be 10 ms.
- * The link_timer register is configured in units of the clock.
- * - When running as 1G SGMII, Serdes clock is 125 MHz, so
- * unit = 1 / (125*10^6 Hz) = 8 ns.
- * 10 ms in units of 8 ns = 10ms / 8ns = 1250000 = 0x1312d0
- * - When running as 2.5G SGMII, Serdes clock is 312.5 MHz, so
- * unit = 1 / (312.5*10^6 Hz) = 3.2 ns.
- * 10 ms in units of 3.2 ns = 10ms / 3.2ns = 3125000 = 0x2faf08.
- * Since link_timer value of 1G SGMII will be too short for 2.5 SGMII,
- * we always set up here a value of 2.5 SGMII.
- */
- phy_write(memac->pcsphy, MDIO_SGMII_LINK_TMR_H, LINK_TMR_H_BASEX);
- phy_write(memac->pcsphy, MDIO_SGMII_LINK_TMR_L, LINK_TMR_L_BASEX);
-
- /* Restart AN */
- tmp_reg16 = SGMII_CR_DEF_VAL | SGMII_CR_RESTART_AN;
- phy_write(memac->pcsphy, 0x0, tmp_reg16);
+ pcs->ops->pcs_an_restart(pcs);
}
static int check_init_parameters(struct fman_mac *memac)
@@ -983,7 +885,6 @@ static int memac_set_exception(struct fm
static int memac_init(struct fman_mac *memac)
{
struct memac_cfg *memac_drv_param;
- u8 i;
enet_addr_t eth_addr;
bool slow_10g_if = false;
struct fixed_phy_status *fixed_link = NULL;
@@ -1036,32 +937,10 @@ static int memac_init(struct fman_mac *m
iowrite32be(reg32, &memac->regs->command_config);
}
- if (memac->phy_if == PHY_INTERFACE_MODE_SGMII) {
- /* Configure internal SGMII PHY */
- if (memac->basex_if)
- setup_sgmii_internal_phy_base_x(memac);
- else
- setup_sgmii_internal_phy(memac, fixed_link);
- } else if (memac->phy_if == PHY_INTERFACE_MODE_QSGMII) {
- /* Configure 4 internal SGMII PHYs */
- for (i = 0; i < 4; i++) {
- u8 qsmgii_phy_addr, phy_addr;
- /* QSGMII PHY address occupies 3 upper bits of 5-bit
- * phy_address; the lower 2 bits are used to extend
- * register address space and access each one of 4
- * ports inside QSGMII.
- */
- phy_addr = memac->pcsphy->mdio.addr;
- qsmgii_phy_addr = (u8)((phy_addr << 2) | i);
- memac->pcsphy->mdio.addr = qsmgii_phy_addr;
- if (memac->basex_if)
- setup_sgmii_internal_phy_base_x(memac);
- else
- setup_sgmii_internal_phy(memac, fixed_link);
-
- memac->pcsphy->mdio.addr = phy_addr;
- }
- }
+ if (memac->phy_if == PHY_INTERFACE_MODE_SGMII)
+ setup_sgmii_internal(memac, memac->sgmii_pcs, fixed_link);
+ else if (memac->phy_if == PHY_INTERFACE_MODE_QSGMII)
+ setup_sgmii_internal(memac, memac->qsgmii_pcs, fixed_link);
/* Max Frame Length */
err = fman_set_mac_max_frame(memac->fm, memac->mac_id,
@@ -1097,12 +976,25 @@ static int memac_init(struct fman_mac *m
return 0;
}
+static void pcs_put(struct phylink_pcs *pcs)
+{
+ struct mdio_device *mdiodev;
+
+ if (IS_ERR_OR_NULL(pcs))
+ return;
+
+ mdiodev = lynx_get_mdio_device(pcs);
+ lynx_pcs_destroy(pcs);
+ mdio_device_free(mdiodev);
+}
+
static int memac_free(struct fman_mac *memac)
{
free_init_resources(memac);
- if (memac->pcsphy)
- put_device(&memac->pcsphy->mdio.dev);
+ pcs_put(memac->sgmii_pcs);
+ pcs_put(memac->qsgmii_pcs);
+ pcs_put(memac->xfi_pcs);
kfree(memac->memac_drv_param);
kfree(memac);
@@ -1153,12 +1045,31 @@ static struct fman_mac *memac_config(str
return memac;
}
+static struct phylink_pcs *memac_pcs_create(struct device_node *mac_node,
+ int index)
+{
+ struct device_node *node;
+ struct mdio_device *mdiodev = NULL;
+ struct phylink_pcs *pcs;
+
+ node = of_parse_phandle(mac_node, "pcsphy-handle", index);
+ if (node && of_device_is_available(node))
+ mdiodev = of_mdio_find_device(node);
+ of_node_put(node);
+
+ if (!mdiodev)
+ return ERR_PTR(-EPROBE_DEFER);
+
+ pcs = lynx_pcs_create(mdiodev);
+ return pcs;
+}
+
int memac_initialization(struct mac_device *mac_dev,
struct device_node *mac_node,
struct fman_mac_params *params)
{
int err;
- struct device_node *phy_node;
+ struct phylink_pcs *pcs;
struct fixed_phy_status *fixed_link;
struct fman_mac *memac;
@@ -1188,23 +1099,58 @@ int memac_initialization(struct mac_devi
memac = mac_dev->fman_mac;
memac->memac_drv_param->max_frame_length = fman_get_max_frm();
memac->memac_drv_param->reset_on_init = true;
- if (memac->phy_if == PHY_INTERFACE_MODE_SGMII ||
- memac->phy_if == PHY_INTERFACE_MODE_QSGMII) {
- phy_node = of_parse_phandle(mac_node, "pcsphy-handle", 0);
- if (!phy_node) {
- pr_err("PCS PHY node is not available\n");
- err = -EINVAL;
+
+ err = of_property_match_string(mac_node, "pcs-handle-names", "xfi");
+ if (err >= 0) {
+ memac->xfi_pcs = memac_pcs_create(mac_node, err);
+ if (IS_ERR(memac->xfi_pcs)) {
+ err = PTR_ERR(memac->xfi_pcs);
+ dev_err_probe(mac_dev->dev, err, "missing xfi pcs\n");
goto _return_fm_mac_free;
}
+ } else if (err != -EINVAL && err != -ENODATA) {
+ goto _return_fm_mac_free;
+ }
- memac->pcsphy = of_phy_find_device(phy_node);
- if (!memac->pcsphy) {
- pr_err("of_phy_find_device (PCS PHY) failed\n");
- err = -EINVAL;
+ err = of_property_match_string(mac_node, "pcs-handle-names", "qsgmii");
+ if (err >= 0) {
+ memac->qsgmii_pcs = memac_pcs_create(mac_node, err);
+ if (IS_ERR(memac->qsgmii_pcs)) {
+ err = PTR_ERR(memac->qsgmii_pcs);
+ dev_err_probe(mac_dev->dev, err,
+ "missing qsgmii pcs\n");
goto _return_fm_mac_free;
}
+ } else if (err != -EINVAL && err != -ENODATA) {
+ goto _return_fm_mac_free;
+ }
+
+ /* For compatibility, if pcs-handle-names is missing, we assume this
+ * phy is the first one in pcsphy-handle
+ */
+ err = of_property_match_string(mac_node, "pcs-handle-names", "sgmii");
+ if (err == -EINVAL || err == -ENODATA)
+ pcs = memac_pcs_create(mac_node, 0);
+ else if (err < 0)
+ goto _return_fm_mac_free;
+ else
+ pcs = memac_pcs_create(mac_node, err);
+
+ if (!pcs) {
+ dev_err(mac_dev->dev, "missing pcs\n");
+ err = -ENOENT;
+ goto _return_fm_mac_free;
}
+ /* If err is set here, it means that pcs-handle-names was missing above
+ * (and therefore that xfi_pcs cannot be set). If we are defaulting to
+ * XGMII, assume this is for XFI. Otherwise, assume it is for SGMII.
+ */
+ if (err && mac_dev->phy_if == PHY_INTERFACE_MODE_XGMII)
+ memac->xfi_pcs = pcs;
+ else
+ memac->sgmii_pcs = pcs;
+
memac->serdes = devm_of_phy_get(mac_dev->dev, mac_node, "serdes");
err = PTR_ERR(memac->serdes);
if (err == -ENODEV || err == -ENOSYS) {

View File

@ -1,93 +0,0 @@
From bf4de031052fe7c5309e8956c342d4e5ce79038e Mon Sep 17 00:00:00 2001
From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Date: Mon, 17 Oct 2022 16:22:35 -0400
Subject: [PATCH 04/21] net: phylink: provide phylink_validate_mask_caps()
helper
Provide a helper that restricts the link modes according to the
phylink capabilities.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
[rebased on net-next/master and added documentation]
Signed-off-by: Sean Anderson <sean.anderson@seco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/net/phy/phylink.c | 41 +++++++++++++++++++++++++++------------
include/linux/phylink.h | 3 +++
2 files changed, 32 insertions(+), 12 deletions(-)
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -564,31 +564,48 @@ unsigned long phylink_get_capabilities(p
EXPORT_SYMBOL_GPL(phylink_get_capabilities);
/**
- * phylink_generic_validate() - generic validate() callback implementation
- * @config: a pointer to a &struct phylink_config.
+ * phylink_validate_mask_caps() - Restrict link modes based on caps
* @supported: ethtool bitmask for supported link modes.
- * @state: a pointer to a &struct phylink_link_state.
+ * @state: an (optional) pointer to a &struct phylink_link_state.
+ * @mac_capabilities: bitmask of MAC capabilities
*
- * Generic implementation of the validate() callback that MAC drivers can
- * use when they pass the range of supported interfaces and MAC capabilities.
- * This makes use of phylink_get_linkmodes().
+ * Calculate the supported link modes based on @mac_capabilities, and restrict
+ * @supported and @state based on that. Use this function if your capabiliies
+ * aren't constant, such as if they vary depending on the interface.
*/
-void phylink_generic_validate(struct phylink_config *config,
- unsigned long *supported,
- struct phylink_link_state *state)
+void phylink_validate_mask_caps(unsigned long *supported,
+ struct phylink_link_state *state,
+ unsigned long mac_capabilities)
{
__ETHTOOL_DECLARE_LINK_MODE_MASK(mask) = { 0, };
unsigned long caps;
phylink_set_port_modes(mask);
phylink_set(mask, Autoneg);
- caps = phylink_get_capabilities(state->interface,
- config->mac_capabilities,
+ caps = phylink_get_capabilities(state->interface, mac_capabilities,
state->rate_matching);
phylink_caps_to_linkmodes(mask, caps);
linkmode_and(supported, supported, mask);
- linkmode_and(state->advertising, state->advertising, mask);
+ if (state)
+ linkmode_and(state->advertising, state->advertising, mask);
+}
+EXPORT_SYMBOL_GPL(phylink_validate_mask_caps);
+
+/**
+ * phylink_generic_validate() - generic validate() callback implementation
+ * @config: a pointer to a &struct phylink_config.
+ * @supported: ethtool bitmask for supported link modes.
+ * @state: a pointer to a &struct phylink_link_state.
+ *
+ * Generic implementation of the validate() callback that MAC drivers can
+ * use when they pass the range of supported interfaces and MAC capabilities.
+ */
+void phylink_generic_validate(struct phylink_config *config,
+ unsigned long *supported,
+ struct phylink_link_state *state)
+{
+ phylink_validate_mask_caps(supported, state, config->mac_capabilities);
}
EXPORT_SYMBOL_GPL(phylink_generic_validate);
--- a/include/linux/phylink.h
+++ b/include/linux/phylink.h
@@ -558,6 +558,9 @@ void phylink_caps_to_linkmodes(unsigned
unsigned long phylink_get_capabilities(phy_interface_t interface,
unsigned long mac_capabilities,
int rate_matching);
+void phylink_validate_mask_caps(unsigned long *supported,
+ struct phylink_link_state *state,
+ unsigned long caps);
void phylink_generic_validate(struct phylink_config *config,
unsigned long *supported,
struct phylink_link_state *state);

View File

@ -1,39 +0,0 @@
From 2bf7e4a68c42eed909f3c29582e1fb85cb157e35 Mon Sep 17 00:00:00 2001
From: Jakub Kicinski <kuba@kernel.org>
Date: Tue, 25 Oct 2022 11:51:26 -0700
Subject: [PATCH 05/21] phylink: require valid state argument to
phylink_validate_mask_caps()
state is deferenced earlier in the function, the NULL check
is pointless. Since we don't have any crash reports presumably
it's safe to assume state is not NULL.
Fixes: f392a1846489 ("net: phylink: provide phylink_validate_mask_caps() helper")
Reviewed-by: Sean Anderson <sean.anderson@seco.com>
Link: https://lore.kernel.org/r/20221025185126.1720553-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
drivers/net/phy/phylink.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -566,7 +566,7 @@ EXPORT_SYMBOL_GPL(phylink_get_capabiliti
/**
* phylink_validate_mask_caps() - Restrict link modes based on caps
* @supported: ethtool bitmask for supported link modes.
- * @state: an (optional) pointer to a &struct phylink_link_state.
+ * @state: pointer to a &struct phylink_link_state.
* @mac_capabilities: bitmask of MAC capabilities
*
* Calculate the supported link modes based on @mac_capabilities, and restrict
@@ -587,8 +587,7 @@ void phylink_validate_mask_caps(unsigned
phylink_caps_to_linkmodes(mask, caps);
linkmode_and(supported, supported, mask);
- if (state)
- linkmode_and(state->advertising, state->advertising, mask);
+ linkmode_and(state->advertising, state->advertising, mask);
}
EXPORT_SYMBOL_GPL(phylink_validate_mask_caps);

View File

@ -1,48 +0,0 @@
From f8fc363bf0c023e4736a0328174b4a24b44ab23a Mon Sep 17 00:00:00 2001
From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Date: Thu, 27 Oct 2022 14:10:37 +0100
Subject: [PATCH 06/21] net: phylink: add phylink_get_link_timer_ns() helper
Add a helper to convert the PHY interface mode to the required link
timer setting as stated by the appropriate standard. Inappropriate
interface modes return an error.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
include/linux/phylink.h | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
--- a/include/linux/phylink.h
+++ b/include/linux/phylink.h
@@ -617,6 +617,30 @@ int phylink_speed_up(struct phylink *pl)
void phylink_set_port_modes(unsigned long *bits);
+/**
+ * phylink_get_link_timer_ns - return the PCS link timer value
+ * @interface: link &typedef phy_interface_t mode
+ *
+ * Return the PCS link timer setting in nanoseconds for the PHY @interface
+ * mode, or -EINVAL if not appropriate.
+ */
+static inline int phylink_get_link_timer_ns(phy_interface_t interface)
+{
+ switch (interface) {
+ case PHY_INTERFACE_MODE_SGMII:
+ case PHY_INTERFACE_MODE_QSGMII:
+ case PHY_INTERFACE_MODE_USXGMII:
+ return 1600000;
+
+ case PHY_INTERFACE_MODE_1000BASEX:
+ case PHY_INTERFACE_MODE_2500BASEX:
+ return 10000000;
+
+ default:
+ return -EINVAL;
+ }
+}
+
void phylink_mii_c22_pcs_decode_state(struct phylink_link_state *state,
u16 bmsr, u16 lpa);
void phylink_mii_c22_pcs_get_state(struct mdio_device *pcs,

View File

@ -1,250 +0,0 @@
From b45b773a96b0e9e8d51e5d005485f4e376d6ce9a Mon Sep 17 00:00:00 2001
From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Date: Fri, 4 Nov 2022 17:13:01 +0000
Subject: [PATCH 07/21] net: remove explicit phylink_generic_validate()
references
Virtually all conventional network drivers are now converted to use
phylink_generic_validate() - only DSA drivers and fman_memac remain,
so lets remove the necessity for network drivers to explicitly set
this member, and default to phylink_generic_validate() when unset.
This is possible as .validate must currently be set.
Any remaining instances that have not been addressed by this patch can
be fixed up later.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/E1or0FZ-001tRa-DI@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
drivers/net/ethernet/altera/altera_tse_main.c | 1 -
drivers/net/ethernet/atheros/ag71xx.c | 1 -
drivers/net/ethernet/cadence/macb_main.c | 1 -
drivers/net/ethernet/freescale/dpaa2/dpaa2-mac.c | 1 -
drivers/net/ethernet/freescale/enetc/enetc_pf.c | 1 -
drivers/net/ethernet/freescale/fman/fman_dtsec.c | 1 -
drivers/net/ethernet/freescale/fman/fman_tgec.c | 1 -
drivers/net/ethernet/marvell/mvneta.c | 1 -
drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c | 1 -
drivers/net/ethernet/marvell/prestera/prestera_main.c | 1 -
drivers/net/ethernet/mediatek/mtk_eth_soc.c | 1 -
drivers/net/ethernet/microchip/lan966x/lan966x_phylink.c | 1 -
drivers/net/ethernet/microchip/sparx5/sparx5_phylink.c | 1 -
drivers/net/ethernet/mscc/ocelot_net.c | 1 -
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 1 -
drivers/net/ethernet/ti/am65-cpsw-nuss.c | 1 -
drivers/net/ethernet/xilinx/xilinx_axienet_main.c | 1 -
drivers/net/phy/phylink.c | 5 ++++-
drivers/net/usb/asix_devices.c | 1 -
include/linux/phylink.h | 5 +++++
20 files changed, 9 insertions(+), 19 deletions(-)
--- a/drivers/net/ethernet/altera/altera_tse_main.c
+++ b/drivers/net/ethernet/altera/altera_tse_main.c
@@ -1096,7 +1096,6 @@ static struct phylink_pcs *alt_tse_selec
}
static const struct phylink_mac_ops alt_tse_phylink_ops = {
- .validate = phylink_generic_validate,
.mac_an_restart = alt_tse_mac_an_restart,
.mac_config = alt_tse_mac_config,
.mac_link_down = alt_tse_mac_link_down,
--- a/drivers/net/ethernet/atheros/ag71xx.c
+++ b/drivers/net/ethernet/atheros/ag71xx.c
@@ -1086,7 +1086,6 @@ static void ag71xx_mac_link_up(struct ph
}
static const struct phylink_mac_ops ag71xx_phylink_mac_ops = {
- .validate = phylink_generic_validate,
.mac_config = ag71xx_mac_config,
.mac_link_down = ag71xx_mac_link_down,
.mac_link_up = ag71xx_mac_link_up,
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -752,7 +752,6 @@ static struct phylink_pcs *macb_mac_sele
}
static const struct phylink_mac_ops macb_phylink_ops = {
- .validate = phylink_generic_validate,
.mac_select_pcs = macb_mac_select_pcs,
.mac_config = macb_mac_config,
.mac_link_down = macb_mac_link_down,
--- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-mac.c
+++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-mac.c
@@ -235,7 +235,6 @@ static void dpaa2_mac_link_down(struct p
}
static const struct phylink_mac_ops dpaa2_mac_phylink_ops = {
- .validate = phylink_generic_validate,
.mac_select_pcs = dpaa2_mac_select_pcs,
.mac_config = dpaa2_mac_config,
.mac_link_up = dpaa2_mac_link_up,
--- a/drivers/net/ethernet/freescale/enetc/enetc_pf.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
@@ -1111,7 +1111,6 @@ static void enetc_pl_mac_link_down(struc
}
static const struct phylink_mac_ops enetc_mac_phylink_ops = {
- .validate = phylink_generic_validate,
.mac_select_pcs = enetc_pl_mac_select_pcs,
.mac_config = enetc_pl_mac_config,
.mac_link_up = enetc_pl_mac_link_up,
--- a/drivers/net/ethernet/freescale/fman/fman_dtsec.c
+++ b/drivers/net/ethernet/freescale/fman/fman_dtsec.c
@@ -986,7 +986,6 @@ static void dtsec_link_down(struct phyli
}
static const struct phylink_mac_ops dtsec_mac_ops = {
- .validate = phylink_generic_validate,
.mac_select_pcs = dtsec_select_pcs,
.mac_config = dtsec_mac_config,
.mac_link_up = dtsec_link_up,
--- a/drivers/net/ethernet/freescale/fman/fman_tgec.c
+++ b/drivers/net/ethernet/freescale/fman/fman_tgec.c
@@ -469,7 +469,6 @@ static void tgec_link_down(struct phylin
}
static const struct phylink_mac_ops tgec_mac_ops = {
- .validate = phylink_generic_validate,
.mac_config = tgec_mac_config,
.mac_link_up = tgec_link_up,
.mac_link_down = tgec_link_down,
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -4228,7 +4228,6 @@ static void mvneta_mac_link_up(struct ph
}
static const struct phylink_mac_ops mvneta_phylink_ops = {
- .validate = phylink_generic_validate,
.mac_select_pcs = mvneta_mac_select_pcs,
.mac_prepare = mvneta_mac_prepare,
.mac_config = mvneta_mac_config,
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
@@ -6633,7 +6633,6 @@ static void mvpp2_mac_link_down(struct p
}
static const struct phylink_mac_ops mvpp2_phylink_ops = {
- .validate = phylink_generic_validate,
.mac_select_pcs = mvpp2_select_pcs,
.mac_prepare = mvpp2_mac_prepare,
.mac_config = mvpp2_mac_config,
--- a/drivers/net/ethernet/marvell/prestera/prestera_main.c
+++ b/drivers/net/ethernet/marvell/prestera/prestera_main.c
@@ -360,7 +360,6 @@ static void prestera_pcs_an_restart(stru
}
static const struct phylink_mac_ops prestera_mac_ops = {
- .validate = phylink_generic_validate,
.mac_select_pcs = prestera_mac_select_pcs,
.mac_config = prestera_mac_config,
.mac_link_down = prestera_mac_link_down,
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -654,7 +654,6 @@ static void mtk_mac_link_up(struct phyli
}
static const struct phylink_mac_ops mtk_phylink_ops = {
- .validate = phylink_generic_validate,
.mac_select_pcs = mtk_mac_select_pcs,
.mac_pcs_get_state = mtk_mac_pcs_get_state,
.mac_config = mtk_mac_config,
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_phylink.c
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_phylink.c
@@ -125,7 +125,6 @@ static void lan966x_pcs_aneg_restart(str
}
const struct phylink_mac_ops lan966x_phylink_mac_ops = {
- .validate = phylink_generic_validate,
.mac_select_pcs = lan966x_phylink_mac_select,
.mac_config = lan966x_phylink_mac_config,
.mac_prepare = lan966x_phylink_mac_prepare,
--- a/drivers/net/ethernet/microchip/sparx5/sparx5_phylink.c
+++ b/drivers/net/ethernet/microchip/sparx5/sparx5_phylink.c
@@ -138,7 +138,6 @@ const struct phylink_pcs_ops sparx5_phyl
};
const struct phylink_mac_ops sparx5_phylink_mac_ops = {
- .validate = phylink_generic_validate,
.mac_select_pcs = sparx5_phylink_mac_select_pcs,
.mac_config = sparx5_phylink_mac_config,
.mac_link_down = sparx5_phylink_mac_link_down,
--- a/drivers/net/ethernet/mscc/ocelot_net.c
+++ b/drivers/net/ethernet/mscc/ocelot_net.c
@@ -1737,7 +1737,6 @@ static void vsc7514_phylink_mac_link_up(
}
static const struct phylink_mac_ops ocelot_phylink_ops = {
- .validate = phylink_generic_validate,
.mac_config = vsc7514_phylink_mac_config,
.mac_link_down = vsc7514_phylink_mac_link_down,
.mac_link_up = vsc7514_phylink_mac_link_up,
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1090,7 +1090,6 @@ static void stmmac_mac_link_up(struct ph
}
static const struct phylink_mac_ops stmmac_phylink_mac_ops = {
- .validate = phylink_generic_validate,
.mac_select_pcs = stmmac_mac_select_pcs,
.mac_config = stmmac_mac_config,
.mac_link_down = stmmac_mac_link_down,
--- a/drivers/net/ethernet/ti/am65-cpsw-nuss.c
+++ b/drivers/net/ethernet/ti/am65-cpsw-nuss.c
@@ -1493,7 +1493,6 @@ static void am65_cpsw_nuss_mac_link_up(s
}
static const struct phylink_mac_ops am65_cpsw_phylink_mac_ops = {
- .validate = phylink_generic_validate,
.mac_config = am65_cpsw_nuss_mac_config,
.mac_link_down = am65_cpsw_nuss_mac_link_down,
.mac_link_up = am65_cpsw_nuss_mac_link_up,
--- a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
+++ b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
@@ -1736,7 +1736,6 @@ static void axienet_mac_link_up(struct p
}
static const struct phylink_mac_ops axienet_phylink_ops = {
- .validate = phylink_generic_validate,
.mac_select_pcs = axienet_mac_select_pcs,
.mac_config = axienet_mac_config,
.mac_link_down = axienet_mac_link_down,
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -651,7 +651,10 @@ static int phylink_validate_mac_and_pcs(
}
/* Then validate the link parameters with the MAC */
- pl->mac_ops->validate(pl->config, supported, state);
+ if (pl->mac_ops->validate)
+ pl->mac_ops->validate(pl->config, supported, state);
+ else
+ phylink_generic_validate(pl->config, supported, state);
return phylink_is_empty_linkmode(supported) ? -EINVAL : 0;
}
--- a/drivers/net/usb/asix_devices.c
+++ b/drivers/net/usb/asix_devices.c
@@ -787,7 +787,6 @@ static void ax88772_mac_link_up(struct p
}
static const struct phylink_mac_ops ax88772_phylink_mac_ops = {
- .validate = phylink_generic_validate,
.mac_config = ax88772_mac_config,
.mac_link_down = ax88772_mac_link_down,
.mac_link_up = ax88772_mac_link_up,
--- a/include/linux/phylink.h
+++ b/include/linux/phylink.h
@@ -207,6 +207,11 @@ struct phylink_mac_ops {
*
* If the @state->interface mode is not supported, then the @supported
* mask must be cleared.
+ *
+ * This member is optional; if not set, the generic validator will be
+ * used making use of @config->mac_capabilities and
+ * @config->supported_interfaces to determine which link modes are
+ * supported.
*/
void validate(struct phylink_config *config, unsigned long *supported,
struct phylink_link_state *state);

View File

@ -1,48 +0,0 @@
From a90ac762d345890b40d88a1385a34a2449c2d75e Mon Sep 17 00:00:00 2001
From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Date: Fri, 24 Mar 2023 09:23:42 +0000
Subject: [PATCH] net: sfp: make sfp_bus_find_fwnode() take a const fwnode
sfp_bus_find_fwnode() does not write to the fwnode, so let's make it
const.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/net/phy/sfp-bus.c | 2 +-
include/linux/sfp.h | 5 +++--
2 files changed, 4 insertions(+), 3 deletions(-)
--- a/drivers/net/phy/sfp-bus.c
+++ b/drivers/net/phy/sfp-bus.c
@@ -603,7 +603,7 @@ static void sfp_upstream_clear(struct sf
* - %-ENOMEM if we failed to allocate the bus.
* - an error from the upstream's connect_phy() method.
*/
-struct sfp_bus *sfp_bus_find_fwnode(struct fwnode_handle *fwnode)
+struct sfp_bus *sfp_bus_find_fwnode(const struct fwnode_handle *fwnode)
{
struct fwnode_reference_args ref;
struct sfp_bus *bus;
--- a/include/linux/sfp.h
+++ b/include/linux/sfp.h
@@ -548,7 +548,7 @@ int sfp_get_module_eeprom_by_page(struct
void sfp_upstream_start(struct sfp_bus *bus);
void sfp_upstream_stop(struct sfp_bus *bus);
void sfp_bus_put(struct sfp_bus *bus);
-struct sfp_bus *sfp_bus_find_fwnode(struct fwnode_handle *fwnode);
+struct sfp_bus *sfp_bus_find_fwnode(const struct fwnode_handle *fwnode);
int sfp_bus_add_upstream(struct sfp_bus *bus, void *upstream,
const struct sfp_upstream_ops *ops);
void sfp_bus_del_upstream(struct sfp_bus *bus);
@@ -610,7 +610,8 @@ static inline void sfp_bus_put(struct sf
{
}
-static inline struct sfp_bus *sfp_bus_find_fwnode(struct fwnode_handle *fwnode)
+static inline struct sfp_bus *
+sfp_bus_find_fwnode(const struct fwnode_handle *fwnode)
{
return NULL;
}

View File

@ -1,31 +0,0 @@
From ecec0ebbc6381a5a375f1cf10c4858f24e91e2ef Mon Sep 17 00:00:00 2001
From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Date: Wed, 15 Mar 2023 14:46:49 +0000
Subject: [PATCH] net: pcs: lynx: don't print an_enabled in pcs_get_state()
an_enabled will be going away, and in any case, pcs_get_state() should
not be updating this member. Remove the print.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/net/pcs/pcs-lynx.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/net/pcs/pcs-lynx.c
+++ b/drivers/net/pcs/pcs-lynx.c
@@ -115,11 +115,11 @@ static void lynx_pcs_get_state(struct ph
}
dev_dbg(&lynx->mdio->dev,
- "mode=%s/%s/%s link=%u an_enabled=%u an_complete=%u\n",
+ "mode=%s/%s/%s link=%u an_complete=%u\n",
phy_modes(state->interface),
phy_speed_to_str(state->speed),
phy_duplex_to_str(state->duplex),
- state->link, state->an_enabled, state->an_complete);
+ state->link, state->an_complete);
}
static int lynx_pcs_config_giga(struct mdio_device *pcs, unsigned int mode,

View File

@ -1,32 +0,0 @@
From 99d0f3a1095f4c938b1665025c29411edafe8a01 Mon Sep 17 00:00:00 2001
From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Date: Tue, 21 Mar 2023 15:58:44 +0000
Subject: [PATCH] net: dpaa2-mac: use Autoneg bit rather than an_enabled
The Autoneg bit in the advertising bitmap and state->an_enabled are
always identical. Thus, we will be removing state->an_enabled.
Use the Autoneg bit in the advertising bitmap to indicate whether
autonegotiation should be used, rather than using the an_enabled
member which will be going away. This means we use the same condition
as phylink_mii_c22_pcs_config().
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
drivers/net/ethernet/freescale/dpaa2/dpaa2-mac.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-mac.c
+++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-mac.c
@@ -158,7 +158,8 @@ static void dpaa2_mac_config(struct phyl
struct dpmac_link_state *dpmac_state = &mac->state;
int err;
- if (state->an_enabled)
+ if (linkmode_test_bit(ETHTOOL_LINK_MODE_Autoneg_BIT,
+ state->advertising))
dpmac_state->options |= DPMAC_LINK_OPT_AUTONEG;
else
dpmac_state->options &= ~DPMAC_LINK_OPT_AUTONEG;

View File

@ -1,64 +0,0 @@
From 471c40bde606ec0b1ce8c616f7998739c7a783a6 Mon Sep 17 00:00:00 2001
From: Ivan Bornyakov <i.bornyakov@metrotek.ru>
Date: Fri, 10 Feb 2023 18:46:27 +0300
Subject: [PATCH 10/21] net: phylink: support validated pause and autoneg in
fixed-link
In fixed-link setup phylink_parse_fixedlink() unconditionally sets
Pause, Asym_Pause and Autoneg bits to "supported" bitmap, while MAC may
not support these.
This leads to ethtool reporting:
> Supported pause frame use: Symmetric Receive-only
> Supports auto-negotiation: Yes
regardless of what is actually supported.
Instead of unconditionally set Pause, Asym_Pause and Autoneg it is
sensible to set them according to validated "supported" bitmap, i.e. the
result of phylink_validate().
Signed-off-by: Ivan Bornyakov <i.bornyakov@metrotek.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/net/phy/phylink.c | 17 ++++++++++++++---
1 file changed, 14 insertions(+), 3 deletions(-)
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -709,6 +709,7 @@ static int phylink_parse_fixedlink(struc
struct fwnode_handle *fwnode)
{
struct fwnode_handle *fixed_node;
+ bool pause, asym_pause, autoneg;
const struct phy_setting *s;
struct gpio_desc *desc;
u32 speed;
@@ -781,13 +782,23 @@ static int phylink_parse_fixedlink(struc
linkmode_copy(pl->link_config.advertising, pl->supported);
phylink_validate(pl, pl->supported, &pl->link_config);
+ pause = phylink_test(pl->supported, Pause);
+ asym_pause = phylink_test(pl->supported, Asym_Pause);
+ autoneg = phylink_test(pl->supported, Autoneg);
s = phy_lookup_setting(pl->link_config.speed, pl->link_config.duplex,
pl->supported, true);
linkmode_zero(pl->supported);
phylink_set(pl->supported, MII);
- phylink_set(pl->supported, Pause);
- phylink_set(pl->supported, Asym_Pause);
- phylink_set(pl->supported, Autoneg);
+
+ if (pause)
+ phylink_set(pl->supported, Pause);
+
+ if (asym_pause)
+ phylink_set(pl->supported, Asym_Pause);
+
+ if (autoneg)
+ phylink_set(pl->supported, Autoneg);
+
if (s) {
__set_bit(s->bit, pl->supported);
__set_bit(s->bit, pl->link_config.lp_advertising);

View File

@ -1,177 +0,0 @@
From 7211ffd70941933a7825a56cf480f07ee81c321c Mon Sep 17 00:00:00 2001
From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Date: Tue, 21 Mar 2023 15:58:54 +0000
Subject: [PATCH 11/21] net: phylink: remove an_enabled
The Autoneg bit in the advertising bitmap and state->an_enabled are
always identical. state->an_enabled is now no longer used by any
drivers, so lets kill this duplication.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
drivers/net/phy/phylink.c | 37 +++++++++++++++++--------------------
include/linux/phylink.h | 2 --
2 files changed, 17 insertions(+), 22 deletions(-)
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -841,7 +841,6 @@ static int phylink_parse_mode(struct phy
phylink_set(pl->supported, Autoneg);
phylink_set(pl->supported, Asym_Pause);
phylink_set(pl->supported, Pause);
- pl->link_config.an_enabled = true;
pl->cfg_link_an_mode = MLO_AN_INBAND;
switch (pl->link_config.interface) {
@@ -944,9 +943,6 @@ static int phylink_parse_mode(struct phy
"failed to validate link configuration for in-band status\n");
return -EINVAL;
}
-
- /* Check if MAC/PCS also supports Autoneg. */
- pl->link_config.an_enabled = phylink_test(pl->supported, Autoneg);
}
return 0;
@@ -956,7 +952,8 @@ static void phylink_apply_manual_flow(st
struct phylink_link_state *state)
{
/* If autoneg is disabled, pause AN is also disabled */
- if (!state->an_enabled)
+ if (!linkmode_test_bit(ETHTOOL_LINK_MODE_Autoneg_BIT,
+ state->advertising))
state->pause &= ~MLO_PAUSE_AN;
/* Manual configuration of pause modes */
@@ -996,21 +993,22 @@ static void phylink_mac_config(struct ph
const struct phylink_link_state *state)
{
phylink_dbg(pl,
- "%s: mode=%s/%s/%s/%s/%s adv=%*pb pause=%02x link=%u an=%u\n",
+ "%s: mode=%s/%s/%s/%s/%s adv=%*pb pause=%02x link=%u\n",
__func__, phylink_an_mode_str(pl->cur_link_an_mode),
phy_modes(state->interface),
phy_speed_to_str(state->speed),
phy_duplex_to_str(state->duplex),
phy_rate_matching_to_str(state->rate_matching),
__ETHTOOL_LINK_MODE_MASK_NBITS, state->advertising,
- state->pause, state->link, state->an_enabled);
+ state->pause, state->link);
pl->mac_ops->mac_config(pl->config, pl->cur_link_an_mode, state);
}
static void phylink_mac_pcs_an_restart(struct phylink *pl)
{
- if (pl->link_config.an_enabled &&
+ if (linkmode_test_bit(ETHTOOL_LINK_MODE_Autoneg_BIT,
+ pl->link_config.advertising) &&
phy_interface_mode_is_8023z(pl->link_config.interface) &&
phylink_autoneg_inband(pl->cur_link_an_mode)) {
if (pl->pcs)
@@ -1137,9 +1135,9 @@ static void phylink_mac_pcs_get_state(st
linkmode_copy(state->advertising, pl->link_config.advertising);
linkmode_zero(state->lp_advertising);
state->interface = pl->link_config.interface;
- state->an_enabled = pl->link_config.an_enabled;
state->rate_matching = pl->link_config.rate_matching;
- if (state->an_enabled) {
+ if (linkmode_test_bit(ETHTOOL_LINK_MODE_Autoneg_BIT,
+ state->advertising)) {
state->speed = SPEED_UNKNOWN;
state->duplex = DUPLEX_UNKNOWN;
state->pause = MLO_PAUSE_NONE;
@@ -1531,7 +1529,6 @@ struct phylink *phylink_create(struct ph
pl->link_config.pause = MLO_PAUSE_AN;
pl->link_config.speed = SPEED_UNKNOWN;
pl->link_config.duplex = DUPLEX_UNKNOWN;
- pl->link_config.an_enabled = true;
pl->mac_ops = mac_ops;
__set_bit(PHYLINK_DISABLE_STOPPED, &pl->phylink_disable_state);
timer_setup(&pl->link_poll, phylink_fixed_poll, 0);
@@ -2155,8 +2152,9 @@ static void phylink_get_ksettings(const
kset->base.speed = state->speed;
kset->base.duplex = state->duplex;
}
- kset->base.autoneg = state->an_enabled ? AUTONEG_ENABLE :
- AUTONEG_DISABLE;
+ kset->base.autoneg = linkmode_test_bit(ETHTOOL_LINK_MODE_Autoneg_BIT,
+ state->advertising) ?
+ AUTONEG_ENABLE : AUTONEG_DISABLE;
}
/**
@@ -2303,9 +2301,8 @@ int phylink_ethtool_ksettings_set(struct
/* We have ruled out the case with a PHY attached, and the
* fixed-link cases. All that is left are in-band links.
*/
- config.an_enabled = kset->base.autoneg == AUTONEG_ENABLE;
linkmode_mod_bit(ETHTOOL_LINK_MODE_Autoneg_BIT, config.advertising,
- config.an_enabled);
+ kset->base.autoneg == AUTONEG_ENABLE);
/* If this link is with an SFP, ensure that changes to advertised modes
* also cause the associated interface to be selected such that the
@@ -2339,13 +2336,14 @@ int phylink_ethtool_ksettings_set(struct
}
/* If autonegotiation is enabled, we must have an advertisement */
- if (config.an_enabled && phylink_is_empty_linkmode(config.advertising))
+ if (linkmode_test_bit(ETHTOOL_LINK_MODE_Autoneg_BIT,
+ config.advertising) &&
+ phylink_is_empty_linkmode(config.advertising))
return -EINVAL;
mutex_lock(&pl->state_mutex);
pl->link_config.speed = config.speed;
pl->link_config.duplex = config.duplex;
- pl->link_config.an_enabled = config.an_enabled;
if (pl->link_config.interface != config.interface) {
/* The interface changed, e.g. 1000base-X <-> 2500base-X */
@@ -2951,7 +2949,6 @@ static int phylink_sfp_config_phy(struct
config.speed = SPEED_UNKNOWN;
config.duplex = DUPLEX_UNKNOWN;
config.pause = MLO_PAUSE_AN;
- config.an_enabled = pl->link_config.an_enabled;
/* Ignore errors if we're expecting a PHY to attach later */
ret = phylink_validate(pl, support, &config);
@@ -3020,7 +3017,6 @@ static int phylink_sfp_config_optical(st
config.speed = SPEED_UNKNOWN;
config.duplex = DUPLEX_UNKNOWN;
config.pause = MLO_PAUSE_AN;
- config.an_enabled = true;
/* For all the interfaces that are supported, reduce the sfp_support
* mask to only those link modes that can be supported.
@@ -3354,7 +3350,8 @@ void phylink_mii_c22_pcs_decode_state(st
/* If there is no link or autonegotiation is disabled, the LP advertisement
* data is not meaningful, so don't go any further.
*/
- if (!state->link || !state->an_enabled)
+ if (!state->link || !linkmode_test_bit(ETHTOOL_LINK_MODE_Autoneg_BIT,
+ state->advertising))
return;
switch (state->interface) {
--- a/include/linux/phylink.h
+++ b/include/linux/phylink.h
@@ -93,7 +93,6 @@ static inline bool phylink_autoneg_inban
* the medium link mode (@speed and @duplex) and the speed/duplex of the phy
* interface mode (@interface) are different.
* @link: true if the link is up.
- * @an_enabled: true if autonegotiation is enabled/desired.
* @an_complete: true if autonegotiation has completed.
*/
struct phylink_link_state {
@@ -105,7 +104,6 @@ struct phylink_link_state {
int pause;
int rate_matching;
unsigned int link:1;
- unsigned int an_enabled:1;
unsigned int an_complete:1;
};

View File

@ -1,88 +0,0 @@
From a3555d1f5c208f0a63eafee77381f68d304a0512 Mon Sep 17 00:00:00 2001
From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Date: Fri, 12 May 2023 17:58:37 +0100
Subject: [PATCH 12/21] net: phylink: constify fwnode arguments
Both phylink_create() and phylink_fwnode_phy_connect() do not modify
the fwnode argument that they are passed, so lets constify these.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/net/phy/phylink.c | 11 ++++++-----
include/linux/phylink.h | 9 +++++----
2 files changed, 11 insertions(+), 9 deletions(-)
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -706,7 +706,7 @@ static int phylink_validate(struct phyli
}
static int phylink_parse_fixedlink(struct phylink *pl,
- struct fwnode_handle *fwnode)
+ const struct fwnode_handle *fwnode)
{
struct fwnode_handle *fixed_node;
bool pause, asym_pause, autoneg;
@@ -817,7 +817,8 @@ static int phylink_parse_fixedlink(struc
return 0;
}
-static int phylink_parse_mode(struct phylink *pl, struct fwnode_handle *fwnode)
+static int phylink_parse_mode(struct phylink *pl,
+ const struct fwnode_handle *fwnode)
{
struct fwnode_handle *dn;
const char *managed;
@@ -1440,7 +1441,7 @@ static void phylink_fixed_poll(struct ti
static const struct sfp_upstream_ops sfp_phylink_ops;
static int phylink_register_sfp(struct phylink *pl,
- struct fwnode_handle *fwnode)
+ const struct fwnode_handle *fwnode)
{
struct sfp_bus *bus;
int ret;
@@ -1479,7 +1480,7 @@ static int phylink_register_sfp(struct p
* must use IS_ERR() to check for errors from this function.
*/
struct phylink *phylink_create(struct phylink_config *config,
- struct fwnode_handle *fwnode,
+ const struct fwnode_handle *fwnode,
phy_interface_t iface,
const struct phylink_mac_ops *mac_ops)
{
@@ -1809,7 +1810,7 @@ EXPORT_SYMBOL_GPL(phylink_of_phy_connect
* Returns 0 on success or a negative errno.
*/
int phylink_fwnode_phy_connect(struct phylink *pl,
- struct fwnode_handle *fwnode,
+ const struct fwnode_handle *fwnode,
u32 flags)
{
struct fwnode_handle *phy_fwnode;
--- a/include/linux/phylink.h
+++ b/include/linux/phylink.h
@@ -568,16 +568,17 @@ void phylink_generic_validate(struct phy
unsigned long *supported,
struct phylink_link_state *state);
-struct phylink *phylink_create(struct phylink_config *, struct fwnode_handle *,
- phy_interface_t iface,
- const struct phylink_mac_ops *mac_ops);
+struct phylink *phylink_create(struct phylink_config *,
+ const struct fwnode_handle *,
+ phy_interface_t,
+ const struct phylink_mac_ops *);
void phylink_destroy(struct phylink *);
bool phylink_expects_phy(struct phylink *pl);
int phylink_connect_phy(struct phylink *, struct phy_device *);
int phylink_of_phy_connect(struct phylink *, struct device_node *, u32 flags);
int phylink_fwnode_phy_connect(struct phylink *pl,
- struct fwnode_handle *fwnode,
+ const struct fwnode_handle *fwnode,
u32 flags);
void phylink_disconnect_phy(struct phylink *);

View File

@ -1,38 +0,0 @@
From 4a0faa02d419a6728abef0f1d8a32d8c35ef95e6 Mon Sep 17 00:00:00 2001
From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Date: Fri, 24 Mar 2023 09:23:53 +0000
Subject: [PATCH] net: phy: constify fwnode_get_phy_node() fwnode argument
fwnode_get_phy_node() does not motify the fwnode structure, so make
the argument const,
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/net/phy/phy_device.c | 2 +-
include/linux/phy.h | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -3003,7 +3003,7 @@ EXPORT_SYMBOL_GPL(device_phy_find_device
* and "phy-device" are not supported in ACPI. DT supports all the three
* named references to the phy node.
*/
-struct fwnode_handle *fwnode_get_phy_node(struct fwnode_handle *fwnode)
+struct fwnode_handle *fwnode_get_phy_node(const struct fwnode_handle *fwnode)
{
struct fwnode_handle *phy_node;
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -1473,7 +1473,7 @@ int fwnode_get_phy_id(struct fwnode_hand
struct mdio_device *fwnode_mdio_find_device(struct fwnode_handle *fwnode);
struct phy_device *fwnode_phy_find_device(struct fwnode_handle *phy_fwnode);
struct phy_device *device_phy_find_device(struct device *dev);
-struct fwnode_handle *fwnode_get_phy_node(struct fwnode_handle *fwnode);
+struct fwnode_handle *fwnode_get_phy_node(const struct fwnode_handle *fwnode);
struct phy_device *get_phy_device(struct mii_bus *bus, int addr, bool is_c45);
int phy_device_register(struct phy_device *phy);
void phy_device_free(struct phy_device *phydev);

View File

@ -1,44 +0,0 @@
From cc73de0411f7d3cdd157564a78f7a39058420ff8 Mon Sep 17 00:00:00 2001
From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Date: Sat, 13 May 2023 22:03:45 +0100
Subject: [PATCH 13/21] net: phylink: fix ksettings_set() ethtool call
While testing a Fiberstore SFP-10G-T module (which uses 10GBASE-R with
rate adaption) in a Clearfog platform (which can't do that) it was
found that the PHYs advertisement was not limited according to the
hosts capabilities when using ethtool to change it.
Fix this by ensuring that we mask the advertisement with the computed
support mask as the very first thing we do.
Fixes: cbc1bb1e4689 ("net: phylink: simplify phy case for ksettings_set method")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/net/phy/phylink.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -2226,6 +2226,10 @@ int phylink_ethtool_ksettings_set(struct
ASSERT_RTNL();
+ /* Mask out unsupported advertisements */
+ linkmode_and(config.advertising, kset->link_modes.advertising,
+ pl->supported);
+
if (pl->phydev) {
/* We can rely on phylib for this update; we also do not need
* to update the pl->link_config settings:
@@ -2250,10 +2254,6 @@ int phylink_ethtool_ksettings_set(struct
config = pl->link_config;
- /* Mask out unsupported advertisements */
- linkmode_and(config.advertising, kset->link_modes.advertising,
- pl->supported);
-
/* FIXME: should we reject autoneg if phy/mac does not support it? */
switch (kset->base.autoneg) {
case AUTONEG_DISABLE:

View File

@ -1,149 +0,0 @@
From 0100d1c5789018ba77bf2f4fab3bd91ecece7b3b Mon Sep 17 00:00:00 2001
From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Date: Wed, 17 May 2023 11:38:12 +0100
Subject: [PATCH 14/21] net: sfp: add support for setting signalling rate
Add support to the SFP layer to allow phylink to set the signalling
rate for a SFP module. The rate given will be in units of kilo-baud
(1000 baud).
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
drivers/net/phy/phylink.c | 24 ++++++++++++++++++++++++
drivers/net/phy/sfp-bus.c | 20 ++++++++++++++++++++
drivers/net/phy/sfp.c | 5 +++++
drivers/net/phy/sfp.h | 1 +
include/linux/sfp.h | 6 ++++++
5 files changed, 56 insertions(+)
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -156,6 +156,23 @@ static const char *phylink_an_mode_str(u
return mode < ARRAY_SIZE(modestr) ? modestr[mode] : "unknown";
}
+static unsigned int phylink_interface_signal_rate(phy_interface_t interface)
+{
+ switch (interface) {
+ case PHY_INTERFACE_MODE_SGMII:
+ case PHY_INTERFACE_MODE_1000BASEX: /* 1.25Mbd */
+ return 1250;
+ case PHY_INTERFACE_MODE_2500BASEX: /* 3.125Mbd */
+ return 3125;
+ case PHY_INTERFACE_MODE_5GBASER: /* 5.15625Mbd */
+ return 5156;
+ case PHY_INTERFACE_MODE_10GBASER: /* 10.3125Mbd */
+ return 10313;
+ default:
+ return 0;
+ }
+}
+
/**
* phylink_interface_max_speed() - get the maximum speed of a phy interface
* @interface: phy interface mode defined by &typedef phy_interface_t
@@ -1024,6 +1041,7 @@ static void phylink_major_config(struct
{
struct phylink_pcs *pcs = NULL;
bool pcs_changed = false;
+ unsigned int rate_kbd;
int err;
phylink_dbg(pl, "major config %s\n", phy_modes(state->interface));
@@ -1083,6 +1101,12 @@ static void phylink_major_config(struct
ERR_PTR(err));
}
+ if (pl->sfp_bus) {
+ rate_kbd = phylink_interface_signal_rate(state->interface);
+ if (rate_kbd)
+ sfp_upstream_set_signal_rate(pl->sfp_bus, rate_kbd);
+ }
+
phylink_pcs_poll_start(pl);
}
--- a/drivers/net/phy/sfp-bus.c
+++ b/drivers/net/phy/sfp-bus.c
@@ -586,6 +586,26 @@ static void sfp_upstream_clear(struct sf
}
/**
+ * sfp_upstream_set_signal_rate() - set data signalling rate
+ * @bus: a pointer to the &struct sfp_bus structure for the sfp module
+ * @rate_kbd: signalling rate in units of 1000 baud
+ *
+ * Configure the rate select settings on the SFP module for the signalling
+ * rate (not the same as the data rate).
+ *
+ * Locks that may be held:
+ * Phylink's state_mutex
+ * rtnl lock
+ * SFP's sm_mutex
+ */
+void sfp_upstream_set_signal_rate(struct sfp_bus *bus, unsigned int rate_kbd)
+{
+ if (bus->registered)
+ bus->socket_ops->set_signal_rate(bus->sfp, rate_kbd);
+}
+EXPORT_SYMBOL_GPL(sfp_upstream_set_signal_rate);
+
+/**
* sfp_bus_find_fwnode() - parse and locate the SFP bus from fwnode
* @fwnode: firmware node for the parent device (MAC or PHY)
*
--- a/drivers/net/phy/sfp.c
+++ b/drivers/net/phy/sfp.c
@@ -2474,6 +2474,10 @@ static void sfp_stop(struct sfp *sfp)
sfp_sm_event(sfp, SFP_E_DEV_DOWN);
}
+static void sfp_set_signal_rate(struct sfp *sfp, unsigned int rate_kbd)
+{
+}
+
static int sfp_module_info(struct sfp *sfp, struct ethtool_modinfo *modinfo)
{
/* locking... and check module is present */
@@ -2552,6 +2556,7 @@ static const struct sfp_socket_ops sfp_m
.detach = sfp_detach,
.start = sfp_start,
.stop = sfp_stop,
+ .set_signal_rate = sfp_set_signal_rate,
.module_info = sfp_module_info,
.module_eeprom = sfp_module_eeprom,
.module_eeprom_by_page = sfp_module_eeprom_by_page,
--- a/drivers/net/phy/sfp.h
+++ b/drivers/net/phy/sfp.h
@@ -19,6 +19,7 @@ struct sfp_socket_ops {
void (*detach)(struct sfp *sfp);
void (*start)(struct sfp *sfp);
void (*stop)(struct sfp *sfp);
+ void (*set_signal_rate)(struct sfp *sfp, unsigned int rate_kbd);
int (*module_info)(struct sfp *sfp, struct ethtool_modinfo *modinfo);
int (*module_eeprom)(struct sfp *sfp, struct ethtool_eeprom *ee,
u8 *data);
--- a/include/linux/sfp.h
+++ b/include/linux/sfp.h
@@ -547,6 +547,7 @@ int sfp_get_module_eeprom_by_page(struct
struct netlink_ext_ack *extack);
void sfp_upstream_start(struct sfp_bus *bus);
void sfp_upstream_stop(struct sfp_bus *bus);
+void sfp_upstream_set_signal_rate(struct sfp_bus *bus, unsigned int rate_kbd);
void sfp_bus_put(struct sfp_bus *bus);
struct sfp_bus *sfp_bus_find_fwnode(const struct fwnode_handle *fwnode);
int sfp_bus_add_upstream(struct sfp_bus *bus, void *upstream,
@@ -606,6 +607,11 @@ static inline void sfp_upstream_stop(str
{
}
+static inline void sfp_upstream_set_signal_rate(struct sfp_bus *bus,
+ unsigned int rate_kbd)
+{
+}
+
static inline void sfp_bus_put(struct sfp_bus *bus)
{
}

View File

@ -1,147 +0,0 @@
From b84acdb07222a701bfc6403b374249c86f97d18d Mon Sep 17 00:00:00 2001
From: Russell King <rmk+kernel@armlinux.org.uk>
Date: Fri, 19 May 2023 14:03:59 +0100
Subject: [PATCH 15/21] net: phy: add helpers for comparing phy IDs
There are several places which open code comparing PHY IDs. Provide a
couple of helpers to assist with this, using a slightly simpler test
than the original:
- phy_id_compare() compares two arbitary PHY IDs and a mask of the
significant bits in the ID.
- phydev_id_compare() compares the bound phydev with the specified
PHY ID, using the bound driver's mask.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/net/phy/micrel.c | 6 +++---
drivers/net/phy/phy_device.c | 16 +++++++---------
drivers/net/phy/phylink.c | 4 ++--
include/linux/phy.h | 28 ++++++++++++++++++++++++++++
4 files changed, 40 insertions(+), 14 deletions(-)
--- a/drivers/net/phy/micrel.c
+++ b/drivers/net/phy/micrel.c
@@ -620,7 +620,7 @@ static int ksz8051_ksz8795_match_phy_dev
{
int ret;
- if ((phydev->phy_id & MICREL_PHY_ID_MASK) != PHY_ID_KSZ8051)
+ if (!phy_id_compare(phydev->phy_id, PHY_ID_KSZ8051, MICREL_PHY_ID_MASK))
return 0;
ret = phy_read(phydev, MII_BMSR);
@@ -1455,7 +1455,7 @@ static int ksz9x31_cable_test_fault_leng
*
* distance to fault = (VCT_DATA - 22) * 4 / cable propagation velocity
*/
- if ((phydev->phy_id & MICREL_PHY_ID_MASK) == PHY_ID_KSZ9131)
+ if (phydev_id_compare(phydev, PHY_ID_KSZ9131))
dt = clamp(dt - 22, 0, 255);
return (dt * 400) / 10;
@@ -1887,7 +1887,7 @@ static __always_inline int ksz886x_cable
*/
dt = FIELD_GET(data_mask, status);
- if ((phydev->phy_id & MICREL_PHY_ID_MASK) == PHY_ID_LAN8814)
+ if (phydev_id_compare(phydev, PHY_ID_LAN8814))
return ((dt - 22) * 800) / 10;
else
return (dt * 400) / 10;
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -422,8 +422,7 @@ int phy_unregister_fixup(const char *bus
fixup = list_entry(pos, struct phy_fixup, list);
if ((!strcmp(fixup->bus_id, bus_id)) &&
- ((fixup->phy_uid & phy_uid_mask) ==
- (phy_uid & phy_uid_mask))) {
+ phy_id_compare(fixup->phy_uid, phy_uid, phy_uid_mask)) {
list_del(&fixup->list);
kfree(fixup);
ret = 0;
@@ -459,8 +458,8 @@ static int phy_needs_fixup(struct phy_de
if (strcmp(fixup->bus_id, PHY_ANY_ID) != 0)
return 0;
- if ((fixup->phy_uid & fixup->phy_uid_mask) !=
- (phydev->phy_id & fixup->phy_uid_mask))
+ if (!phy_id_compare(phydev->phy_id, fixup->phy_uid,
+ fixup->phy_uid_mask))
if (fixup->phy_uid != PHY_ANY_UID)
return 0;
@@ -507,15 +506,14 @@ static int phy_bus_match(struct device *
if (phydev->c45_ids.device_ids[i] == 0xffffffff)
continue;
- if ((phydrv->phy_id & phydrv->phy_id_mask) ==
- (phydev->c45_ids.device_ids[i] &
- phydrv->phy_id_mask))
+ if (phy_id_compare(phydev->c45_ids.device_ids[i],
+ phydrv->phy_id, phydrv->phy_id_mask))
return 1;
}
return 0;
} else {
- return (phydrv->phy_id & phydrv->phy_id_mask) ==
- (phydev->phy_id & phydrv->phy_id_mask);
+ return phy_id_compare(phydev->phy_id, phydrv->phy_id,
+ phydrv->phy_id_mask);
}
}
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -3151,8 +3151,8 @@ static void phylink_sfp_link_up(void *up
*/
static bool phylink_phy_no_inband(struct phy_device *phy)
{
- return phy->is_c45 &&
- (phy->c45_ids.device_ids[1] & 0xfffffff0) == 0xae025150;
+ return phy->is_c45 && phy_id_compare(phy->c45_ids.device_ids[1],
+ 0xae025150, 0xfffffff0);
}
static int phylink_sfp_connect_phy(void *upstream, struct phy_device *phy)
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -993,6 +993,34 @@ struct phy_driver {
#define PHY_ID_MATCH_MODEL(id) .phy_id = (id), .phy_id_mask = GENMASK(31, 4)
#define PHY_ID_MATCH_VENDOR(id) .phy_id = (id), .phy_id_mask = GENMASK(31, 10)
+/**
+ * phy_id_compare - compare @id1 with @id2 taking account of @mask
+ * @id1: first PHY ID
+ * @id2: second PHY ID
+ * @mask: the PHY ID mask, set bits are significant in matching
+ *
+ * Return true if the bits from @id1 and @id2 specified by @mask match.
+ * This uses an equivalent test to (@id & @mask) == (@phy_id & @mask).
+ */
+static inline bool phy_id_compare(u32 id1, u32 id2, u32 mask)
+{
+ return !((id1 ^ id2) & mask);
+}
+
+/**
+ * phydev_id_compare - compare @id with the PHY's Clause 22 ID
+ * @phydev: the PHY device
+ * @id: the PHY ID to be matched
+ *
+ * Compare the @phydev clause 22 ID with the provided @id and return true or
+ * false depending whether it matches, using the bound driver mask. The
+ * @phydev must be bound to a driver.
+ */
+static inline bool phydev_id_compare(struct phy_device *phydev, u32 id)
+{
+ return phy_id_compare(id, phydev->phy_id, phydev->drv->phy_id_mask);
+}
+
/* A Structure for boards to register fixups with the PHY Lib */
struct phy_fixup {
struct list_head list;

View File

@ -1,71 +0,0 @@
From 441e1e44301fc5762a06737f8ec04bf1ce3fb039 Mon Sep 17 00:00:00 2001
From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Date: Sat, 20 May 2023 11:41:42 +0100
Subject: [PATCH 16/21] net: phylink: require supported_interfaces to be filled
We have been requiring the supported_interfaces bitmap to be filled in
by MAC drivers that have a mac_select_pcs() method. Now that all MAC
drivers fill in the supported_interfaces bitmap, it is time to enforce
this. We have already required supported_interfaces to be set in order
for optical SFPs to be configured in commit f81fa96d8a6c ("net: phylink:
use phy_interface_t bitmaps for optical modules").
Refuse phylink creation if supported_interfaces is empty, and remove
code to deal with cases where this mask is empty.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/E1q0K1u-006EIP-ET@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
drivers/net/phy/phylink.c | 26 +++++++++++---------------
1 file changed, 11 insertions(+), 15 deletions(-)
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -710,14 +710,11 @@ static int phylink_validate(struct phyli
{
const unsigned long *interfaces = pl->config->supported_interfaces;
- if (!phy_interface_empty(interfaces)) {
- if (state->interface == PHY_INTERFACE_MODE_NA)
- return phylink_validate_mask(pl, supported, state,
- interfaces);
+ if (state->interface == PHY_INTERFACE_MODE_NA)
+ return phylink_validate_mask(pl, supported, state, interfaces);
- if (!test_bit(state->interface, interfaces))
- return -EINVAL;
- }
+ if (!test_bit(state->interface, interfaces))
+ return -EINVAL;
return phylink_validate_mac_and_pcs(pl, supported, state);
}
@@ -1512,19 +1509,18 @@ struct phylink *phylink_create(struct ph
struct phylink *pl;
int ret;
- if (mac_ops->mac_select_pcs &&
- mac_ops->mac_select_pcs(config, PHY_INTERFACE_MODE_NA) !=
- ERR_PTR(-EOPNOTSUPP))
- using_mac_select_pcs = true;
-
/* Validate the supplied configuration */
- if (using_mac_select_pcs &&
- phy_interface_empty(config->supported_interfaces)) {
+ if (phy_interface_empty(config->supported_interfaces)) {
dev_err(config->dev,
- "phylink: error: empty supported_interfaces but mac_select_pcs() method present\n");
+ "phylink: error: empty supported_interfaces\n");
return ERR_PTR(-EINVAL);
}
+ if (mac_ops->mac_select_pcs &&
+ mac_ops->mac_select_pcs(config, PHY_INTERFACE_MODE_NA) !=
+ ERR_PTR(-EOPNOTSUPP))
+ using_mac_select_pcs = true;
+
pl = kzalloc(sizeof(*pl), GFP_KERNEL);
if (!pl)
return ERR_PTR(-ENOMEM);

View File

@ -1,64 +0,0 @@
From 4b624a39f2ab523ca6a6ad9448fab1deb7b101e2 Mon Sep 17 00:00:00 2001
From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Date: Tue, 23 May 2023 11:15:53 +0100
Subject: [PATCH 17/21] net: phylink: remove duplicated linkmode pause
resolution
Phylink had two chunks of code virtually the same for resolving the
negotiated pause modes. Factor this down to one function.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
drivers/net/phy/phylink.c | 15 ++++-----------
1 file changed, 4 insertions(+), 11 deletions(-)
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -976,11 +976,10 @@ static void phylink_apply_manual_flow(st
state->pause = pl->link_config.pause;
}
-static void phylink_resolve_flow(struct phylink_link_state *state)
+static void phylink_resolve_an_pause(struct phylink_link_state *state)
{
bool tx_pause, rx_pause;
- state->pause = MLO_PAUSE_NONE;
if (state->duplex == DUPLEX_FULL) {
linkmode_resolve_pause(state->advertising,
state->lp_advertising,
@@ -1192,7 +1191,8 @@ static void phylink_get_fixed_state(stru
else if (pl->link_gpio)
state->link = !!gpiod_get_value_cansleep(pl->link_gpio);
- phylink_resolve_flow(state);
+ state->pause = MLO_PAUSE_NONE;
+ phylink_resolve_an_pause(state);
}
static void phylink_mac_initial_config(struct phylink *pl, bool force_restart)
@@ -3215,7 +3215,6 @@ static const struct sfp_upstream_ops sfp
static void phylink_decode_c37_word(struct phylink_link_state *state,
uint16_t config_reg, int speed)
{
- bool tx_pause, rx_pause;
int fd_bit;
if (speed == SPEED_2500)
@@ -3234,13 +3233,7 @@ static void phylink_decode_c37_word(stru
state->link = false;
}
- linkmode_resolve_pause(state->advertising, state->lp_advertising,
- &tx_pause, &rx_pause);
-
- if (tx_pause)
- state->pause |= MLO_PAUSE_TX;
- if (rx_pause)
- state->pause |= MLO_PAUSE_RX;
+ phylink_resolve_an_pause(state);
}
static void phylink_decode_sgmii_word(struct phylink_link_state *state,

View File

@ -1,76 +0,0 @@
From aa8b6bd2b1f235b262bd27f317a0516f196c2c6a Mon Sep 17 00:00:00 2001
From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Date: Tue, 23 May 2023 11:15:58 +0100
Subject: [PATCH 18/21] net: phylink: add function to resolve clause 73
negotiation
Add a function to resolve clause 73 negotiation according to the
priority resolution function described in clause 73.3.6.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
drivers/net/phy/phylink.c | 39 +++++++++++++++++++++++++++++++++++++++
include/linux/phylink.h | 2 ++
2 files changed, 41 insertions(+)
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -3212,6 +3212,45 @@ static const struct sfp_upstream_ops sfp
/* Helpers for MAC drivers */
+static struct {
+ int bit;
+ int speed;
+} phylink_c73_priority_resolution[] = {
+ { ETHTOOL_LINK_MODE_100000baseCR4_Full_BIT, SPEED_100000 },
+ { ETHTOOL_LINK_MODE_100000baseKR4_Full_BIT, SPEED_100000 },
+ /* 100GBASE-KP4 and 100GBASE-CR10 not supported */
+ { ETHTOOL_LINK_MODE_40000baseCR4_Full_BIT, SPEED_40000 },
+ { ETHTOOL_LINK_MODE_40000baseKR4_Full_BIT, SPEED_40000 },
+ { ETHTOOL_LINK_MODE_10000baseKR_Full_BIT, SPEED_10000 },
+ { ETHTOOL_LINK_MODE_10000baseKX4_Full_BIT, SPEED_10000 },
+ /* 5GBASE-KR not supported */
+ { ETHTOOL_LINK_MODE_2500baseX_Full_BIT, SPEED_2500 },
+ { ETHTOOL_LINK_MODE_1000baseKX_Full_BIT, SPEED_1000 },
+};
+
+void phylink_resolve_c73(struct phylink_link_state *state)
+{
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(phylink_c73_priority_resolution); i++) {
+ int bit = phylink_c73_priority_resolution[i].bit;
+ if (linkmode_test_bit(bit, state->advertising) &&
+ linkmode_test_bit(bit, state->lp_advertising))
+ break;
+ }
+
+ if (i < ARRAY_SIZE(phylink_c73_priority_resolution)) {
+ state->speed = phylink_c73_priority_resolution[i].speed;
+ state->duplex = DUPLEX_FULL;
+ } else {
+ /* negotiation failure */
+ state->link = false;
+ }
+
+ phylink_resolve_an_pause(state);
+}
+EXPORT_SYMBOL_GPL(phylink_resolve_c73);
+
static void phylink_decode_c37_word(struct phylink_link_state *state,
uint16_t config_reg, int speed)
{
--- a/include/linux/phylink.h
+++ b/include/linux/phylink.h
@@ -656,6 +656,8 @@ int phylink_mii_c22_pcs_config(struct md
const unsigned long *advertising);
void phylink_mii_c22_pcs_an_restart(struct mdio_device *pcs);
+void phylink_resolve_c73(struct phylink_link_state *state);
+
void phylink_mii_c45_pcs_get_state(struct mdio_device *pcs,
struct phylink_link_state *state);

View File

@ -1,100 +0,0 @@
From 796d709363135a6bd6a8ccc07b509c939e5b855f Mon Sep 17 00:00:00 2001
From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Date: Tue, 23 May 2023 16:31:50 +0100
Subject: [PATCH 19/21] net: phylink: provide phylink_pcs_config() and
phylink_pcs_link_up()
Add two helper functions for calling PCS methods. phylink_pcs_config()
allows us to handle PCS configuration specifics in one location, rather
than the two call sites. phylink_pcs_link_up() gives us consistency.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/E1q1TzK-007Exd-Rs@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
drivers/net/phy/phylink.c | 53 ++++++++++++++++++++++++---------------
1 file changed, 33 insertions(+), 20 deletions(-)
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -991,6 +991,25 @@ static void phylink_resolve_an_pause(str
}
}
+static int phylink_pcs_config(struct phylink_pcs *pcs, unsigned int mode,
+ const struct phylink_link_state *state,
+ bool permit_pause_to_mac)
+{
+ if (!pcs)
+ return 0;
+
+ return pcs->ops->pcs_config(pcs, mode, state->interface,
+ state->advertising, permit_pause_to_mac);
+}
+
+static void phylink_pcs_link_up(struct phylink_pcs *pcs, unsigned int mode,
+ phy_interface_t interface, int speed,
+ int duplex)
+{
+ if (pcs && pcs->ops->pcs_link_up)
+ pcs->ops->pcs_link_up(pcs, mode, interface, speed, duplex);
+}
+
static void phylink_pcs_poll_stop(struct phylink *pl)
{
if (pl->cfg_link_an_mode == MLO_AN_INBAND)
@@ -1074,18 +1093,15 @@ static void phylink_major_config(struct
phylink_mac_config(pl, state);
- if (pl->pcs) {
- err = pl->pcs->ops->pcs_config(pl->pcs, pl->cur_link_an_mode,
- state->interface,
- state->advertising,
- !!(pl->link_config.pause &
- MLO_PAUSE_AN));
- if (err < 0)
- phylink_err(pl, "pcs_config failed: %pe\n",
- ERR_PTR(err));
- if (err > 0)
- restart = true;
- }
+ err = phylink_pcs_config(pl->pcs, pl->cur_link_an_mode, state,
+ !!(pl->link_config.pause &
+ MLO_PAUSE_AN));
+ if (err < 0)
+ phylink_err(pl, "pcs_config failed: %pe\n",
+ ERR_PTR(err));
+ else if (err > 0)
+ restart = true;
+
if (restart)
phylink_mac_pcs_an_restart(pl);
@@ -1136,11 +1152,9 @@ static int phylink_change_inband_advert(
* restart negotiation if the pcs_config() helper indicates that
* the programmed advertisement has changed.
*/
- ret = pl->pcs->ops->pcs_config(pl->pcs, pl->cur_link_an_mode,
- pl->link_config.interface,
- pl->link_config.advertising,
- !!(pl->link_config.pause &
- MLO_PAUSE_AN));
+ ret = phylink_pcs_config(pl->pcs, pl->cur_link_an_mode,
+ &pl->link_config,
+ !!(pl->link_config.pause & MLO_PAUSE_AN));
if (ret < 0)
return ret;
@@ -1272,9 +1286,8 @@ static void phylink_link_up(struct phyli
pl->cur_interface = link_state.interface;
- if (pl->pcs && pl->pcs->ops->pcs_link_up)
- pl->pcs->ops->pcs_link_up(pl->pcs, pl->cur_link_an_mode,
- pl->cur_interface, speed, duplex);
+ phylink_pcs_link_up(pl->pcs, pl->cur_link_an_mode, pl->cur_interface,
+ speed, duplex);
pl->mac_ops->mac_link_up(pl->config, pl->phydev, pl->cur_link_an_mode,
pl->cur_interface, speed, duplex,

View File

@ -1,55 +0,0 @@
From 11933aa76865621d8e82553c8f3bc07796a5aaa2 Mon Sep 17 00:00:00 2001
From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Date: Thu, 1 Jun 2023 10:12:06 +0100
Subject: [PATCH 20/21] net: phylink: actually fix ksettings_set() ethtool call
Raju Lakkaraju reported that the below commit caused a regression
with Lan743x drivers and a 2.5G SFP. Sadly, this is because the commit
was utterly wrong. Let's fix this properly by not moving the
linkmode_and(), but instead copying the link ksettings and then
modifying the advertising mask before passing the modified link
ksettings to phylib.
Fixes: df0acdc59b09 ("net: phylink: fix ksettings_set() ethtool call")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/E1q4eLm-00Ayxk-GZ@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
drivers/net/phy/phylink.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -2259,11 +2259,13 @@ int phylink_ethtool_ksettings_set(struct
ASSERT_RTNL();
- /* Mask out unsupported advertisements */
- linkmode_and(config.advertising, kset->link_modes.advertising,
- pl->supported);
-
if (pl->phydev) {
+ struct ethtool_link_ksettings phy_kset = *kset;
+
+ linkmode_and(phy_kset.link_modes.advertising,
+ phy_kset.link_modes.advertising,
+ pl->supported);
+
/* We can rely on phylib for this update; we also do not need
* to update the pl->link_config settings:
* - the configuration returned via ksettings_get() will come
@@ -2282,10 +2284,13 @@ int phylink_ethtool_ksettings_set(struct
* the presence of a PHY, this should not be changed as that
* should be determined from the media side advertisement.
*/
- return phy_ethtool_ksettings_set(pl->phydev, kset);
+ return phy_ethtool_ksettings_set(pl->phydev, &phy_kset);
}
config = pl->link_config;
+ /* Mask out unsupported advertisements */
+ linkmode_and(config.advertising, kset->link_modes.advertising,
+ pl->supported);
/* FIXME: should we reject autoneg if phy/mac does not support it? */
switch (kset->base.autoneg) {

View File

@ -1,324 +0,0 @@
From 79b07c3e9c4a2272927be8848c26b372516e1958 Mon Sep 17 00:00:00 2001
From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Date: Fri, 16 Jun 2023 13:06:22 +0100
Subject: [PATCH 21/21] net: phylink: add PCS negotiation mode
PCS have to work out whether they should enable PCS negotiation by
looking at the "mode" and "interface" arguments, and the Autoneg bit
in the advertising mask.
This leads to some complex logic, so lets pull that out into phylink
and instead pass a "neg_mode" argument to the PCS configuration and
link up methods, instead of the "mode" argument.
In order to transition drivers, add a "neg_mode" flag to the phylink
PCS structure to PCS can indicate whether they want to be passed the
neg_mode or the old mode argument.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/E1qA8De-00EaFA-Ht@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
drivers/net/phy/phylink.c | 45 +++++++++++++----
include/linux/phylink.h | 104 +++++++++++++++++++++++++++++++++++---
2 files changed, 132 insertions(+), 17 deletions(-)
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -71,6 +71,7 @@ struct phylink {
struct mutex state_mutex;
struct phylink_link_state phy_state;
struct work_struct resolve;
+ unsigned int pcs_neg_mode;
bool mac_link_dropped;
bool using_mac_select_pcs;
@@ -991,23 +992,23 @@ static void phylink_resolve_an_pause(str
}
}
-static int phylink_pcs_config(struct phylink_pcs *pcs, unsigned int mode,
+static int phylink_pcs_config(struct phylink_pcs *pcs, unsigned int neg_mode,
const struct phylink_link_state *state,
bool permit_pause_to_mac)
{
if (!pcs)
return 0;
- return pcs->ops->pcs_config(pcs, mode, state->interface,
+ return pcs->ops->pcs_config(pcs, neg_mode, state->interface,
state->advertising, permit_pause_to_mac);
}
-static void phylink_pcs_link_up(struct phylink_pcs *pcs, unsigned int mode,
+static void phylink_pcs_link_up(struct phylink_pcs *pcs, unsigned int neg_mode,
phy_interface_t interface, int speed,
int duplex)
{
if (pcs && pcs->ops->pcs_link_up)
- pcs->ops->pcs_link_up(pcs, mode, interface, speed, duplex);
+ pcs->ops->pcs_link_up(pcs, neg_mode, interface, speed, duplex);
}
static void phylink_pcs_poll_stop(struct phylink *pl)
@@ -1057,10 +1058,15 @@ static void phylink_major_config(struct
struct phylink_pcs *pcs = NULL;
bool pcs_changed = false;
unsigned int rate_kbd;
+ unsigned int neg_mode;
int err;
phylink_dbg(pl, "major config %s\n", phy_modes(state->interface));
+ pl->pcs_neg_mode = phylink_pcs_neg_mode(pl->cur_link_an_mode,
+ state->interface,
+ state->advertising);
+
if (pl->using_mac_select_pcs) {
pcs = pl->mac_ops->mac_select_pcs(pl->config, state->interface);
if (IS_ERR(pcs)) {
@@ -1093,9 +1099,12 @@ static void phylink_major_config(struct
phylink_mac_config(pl, state);
- err = phylink_pcs_config(pl->pcs, pl->cur_link_an_mode, state,
- !!(pl->link_config.pause &
- MLO_PAUSE_AN));
+ neg_mode = pl->cur_link_an_mode;
+ if (pl->pcs && pl->pcs->neg_mode)
+ neg_mode = pl->pcs_neg_mode;
+
+ err = phylink_pcs_config(pl->pcs, neg_mode, state,
+ !!(pl->link_config.pause & MLO_PAUSE_AN));
if (err < 0)
phylink_err(pl, "pcs_config failed: %pe\n",
ERR_PTR(err));
@@ -1130,6 +1139,7 @@ static void phylink_major_config(struct
*/
static int phylink_change_inband_advert(struct phylink *pl)
{
+ unsigned int neg_mode;
int ret;
if (test_bit(PHYLINK_DISABLE_STOPPED, &pl->phylink_disable_state))
@@ -1148,12 +1158,20 @@ static int phylink_change_inband_advert(
__ETHTOOL_LINK_MODE_MASK_NBITS, pl->link_config.advertising,
pl->link_config.pause);
+ /* Recompute the PCS neg mode */
+ pl->pcs_neg_mode = phylink_pcs_neg_mode(pl->cur_link_an_mode,
+ pl->link_config.interface,
+ pl->link_config.advertising);
+
+ neg_mode = pl->cur_link_an_mode;
+ if (pl->pcs->neg_mode)
+ neg_mode = pl->pcs_neg_mode;
+
/* Modern PCS-based method; update the advert at the PCS, and
* restart negotiation if the pcs_config() helper indicates that
* the programmed advertisement has changed.
*/
- ret = phylink_pcs_config(pl->pcs, pl->cur_link_an_mode,
- &pl->link_config,
+ ret = phylink_pcs_config(pl->pcs, neg_mode, &pl->link_config,
!!(pl->link_config.pause & MLO_PAUSE_AN));
if (ret < 0)
return ret;
@@ -1256,6 +1274,7 @@ static void phylink_link_up(struct phyli
struct phylink_link_state link_state)
{
struct net_device *ndev = pl->netdev;
+ unsigned int neg_mode;
int speed, duplex;
bool rx_pause;
@@ -1286,8 +1305,12 @@ static void phylink_link_up(struct phyli
pl->cur_interface = link_state.interface;
- phylink_pcs_link_up(pl->pcs, pl->cur_link_an_mode, pl->cur_interface,
- speed, duplex);
+ neg_mode = pl->cur_link_an_mode;
+ if (pl->pcs && pl->pcs->neg_mode)
+ neg_mode = pl->pcs_neg_mode;
+
+ phylink_pcs_link_up(pl->pcs, neg_mode, pl->cur_interface, speed,
+ duplex);
pl->mac_ops->mac_link_up(pl->config, pl->phydev, pl->cur_link_an_mode,
pl->cur_interface, speed, duplex,
--- a/include/linux/phylink.h
+++ b/include/linux/phylink.h
@@ -21,6 +21,24 @@ enum {
MLO_AN_FIXED, /* Fixed-link mode */
MLO_AN_INBAND, /* In-band protocol */
+ /* PCS "negotiation" mode.
+ * PHYLINK_PCS_NEG_NONE - protocol has no inband capability
+ * PHYLINK_PCS_NEG_OUTBAND - some out of band or fixed link setting
+ * PHYLINK_PCS_NEG_INBAND_DISABLED - inband mode disabled, e.g.
+ * 1000base-X with autoneg off
+ * PHYLINK_PCS_NEG_INBAND_ENABLED - inband mode enabled
+ * Additionally, this can be tested using bitmasks:
+ * PHYLINK_PCS_NEG_INBAND - inband mode selected
+ * PHYLINK_PCS_NEG_ENABLED - negotiation mode enabled
+ */
+ PHYLINK_PCS_NEG_NONE = 0,
+ PHYLINK_PCS_NEG_ENABLED = BIT(4),
+ PHYLINK_PCS_NEG_OUTBAND = BIT(5),
+ PHYLINK_PCS_NEG_INBAND = BIT(6),
+ PHYLINK_PCS_NEG_INBAND_DISABLED = PHYLINK_PCS_NEG_INBAND,
+ PHYLINK_PCS_NEG_INBAND_ENABLED = PHYLINK_PCS_NEG_INBAND |
+ PHYLINK_PCS_NEG_ENABLED,
+
/* MAC_SYM_PAUSE and MAC_ASYM_PAUSE are used when configuring our
* autonegotiation advertisement. They correspond to the PAUSE and
* ASM_DIR bits defined by 802.3, respectively.
@@ -80,6 +98,70 @@ static inline bool phylink_autoneg_inban
}
/**
+ * phylink_pcs_neg_mode() - helper to determine PCS inband mode
+ * @mode: one of %MLO_AN_FIXED, %MLO_AN_PHY, %MLO_AN_INBAND.
+ * @interface: interface mode to be used
+ * @advertising: adertisement ethtool link mode mask
+ *
+ * Determines the negotiation mode to be used by the PCS, and returns
+ * one of:
+ * %PHYLINK_PCS_NEG_NONE: interface mode does not support inband
+ * %PHYLINK_PCS_NEG_OUTBAND: an out of band mode (e.g. reading the PHY)
+ * will be used.
+ * %PHYLINK_PCS_NEG_INBAND_DISABLED: inband mode selected but autoneg disabled
+ * %PHYLINK_PCS_NEG_INBAND_ENABLED: inband mode selected and autoneg enabled
+ *
+ * Note: this is for cases where the PCS itself is involved in negotiation
+ * (e.g. Clause 37, SGMII and similar) not Clause 73.
+ */
+static inline unsigned int phylink_pcs_neg_mode(unsigned int mode,
+ phy_interface_t interface,
+ const unsigned long *advertising)
+{
+ unsigned int neg_mode;
+
+ switch (interface) {
+ case PHY_INTERFACE_MODE_SGMII:
+ case PHY_INTERFACE_MODE_QSGMII:
+ case PHY_INTERFACE_MODE_QUSGMII:
+ case PHY_INTERFACE_MODE_USXGMII:
+ /* These protocols are designed for use with a PHY which
+ * communicates its negotiation result back to the MAC via
+ * inband communication. Note: there exist PHYs that run
+ * with SGMII but do not send the inband data.
+ */
+ if (!phylink_autoneg_inband(mode))
+ neg_mode = PHYLINK_PCS_NEG_OUTBAND;
+ else
+ neg_mode = PHYLINK_PCS_NEG_INBAND_ENABLED;
+ break;
+
+ case PHY_INTERFACE_MODE_1000BASEX:
+ case PHY_INTERFACE_MODE_2500BASEX:
+ /* 1000base-X is designed for use media-side for Fibre
+ * connections, and thus the Autoneg bit needs to be
+ * taken into account. We also do this for 2500base-X
+ * as well, but drivers may not support this, so may
+ * need to override this.
+ */
+ if (!phylink_autoneg_inband(mode))
+ neg_mode = PHYLINK_PCS_NEG_OUTBAND;
+ else if (linkmode_test_bit(ETHTOOL_LINK_MODE_Autoneg_BIT,
+ advertising))
+ neg_mode = PHYLINK_PCS_NEG_INBAND_ENABLED;
+ else
+ neg_mode = PHYLINK_PCS_NEG_INBAND_DISABLED;
+ break;
+
+ default:
+ neg_mode = PHYLINK_PCS_NEG_NONE;
+ break;
+ }
+
+ return neg_mode;
+}
+
+/**
* struct phylink_link_state - link state structure
* @advertising: ethtool bitmask containing advertised link modes
* @lp_advertising: ethtool bitmask containing link partner advertised link
@@ -436,6 +518,7 @@ struct phylink_pcs_ops;
/**
* struct phylink_pcs - PHYLINK PCS instance
* @ops: a pointer to the &struct phylink_pcs_ops structure
+ * @neg_mode: provide PCS neg mode via "mode" argument
* @poll: poll the PCS for link changes
*
* This structure is designed to be embedded within the PCS private data,
@@ -443,6 +526,7 @@ struct phylink_pcs_ops;
*/
struct phylink_pcs {
const struct phylink_pcs_ops *ops;
+ bool neg_mode;
bool poll;
};
@@ -460,12 +544,12 @@ struct phylink_pcs_ops {
const struct phylink_link_state *state);
void (*pcs_get_state)(struct phylink_pcs *pcs,
struct phylink_link_state *state);
- int (*pcs_config)(struct phylink_pcs *pcs, unsigned int mode,
+ int (*pcs_config)(struct phylink_pcs *pcs, unsigned int neg_mode,
phy_interface_t interface,
const unsigned long *advertising,
bool permit_pause_to_mac);
void (*pcs_an_restart)(struct phylink_pcs *pcs);
- void (*pcs_link_up)(struct phylink_pcs *pcs, unsigned int mode,
+ void (*pcs_link_up)(struct phylink_pcs *pcs, unsigned int neg_mode,
phy_interface_t interface, int speed, int duplex);
};
@@ -508,7 +592,7 @@ void pcs_get_state(struct phylink_pcs *p
/**
* pcs_config() - Configure the PCS mode and advertisement
* @pcs: a pointer to a &struct phylink_pcs.
- * @mode: one of %MLO_AN_FIXED, %MLO_AN_PHY, %MLO_AN_INBAND.
+ * @neg_mode: link negotiation mode (see below)
* @interface: interface mode to be used
* @advertising: adertisement ethtool link mode mask
* @permit_pause_to_mac: permit forwarding pause resolution to MAC
@@ -526,8 +610,12 @@ void pcs_get_state(struct phylink_pcs *p
* For 1000BASE-X, the advertisement should be programmed into the PCS.
*
* For most 10GBASE-R, there is no advertisement.
+ *
+ * The %neg_mode argument should be tested via the phylink_mode_*() family of
+ * functions, or for PCS that set pcs->neg_mode true, should be tested
+ * against the %PHYLINK_PCS_NEG_* definitions.
*/
-int pcs_config(struct phylink_pcs *pcs, unsigned int mode,
+int pcs_config(struct phylink_pcs *pcs, unsigned int neg_mode,
phy_interface_t interface, const unsigned long *advertising,
bool permit_pause_to_mac);
@@ -543,7 +631,7 @@ void pcs_an_restart(struct phylink_pcs *
/**
* pcs_link_up() - program the PCS for the resolved link configuration
* @pcs: a pointer to a &struct phylink_pcs.
- * @mode: link autonegotiation mode
+ * @neg_mode: link negotiation mode (see below)
* @interface: link &typedef phy_interface_t mode
* @speed: link speed
* @duplex: link duplex
@@ -552,8 +640,12 @@ void pcs_an_restart(struct phylink_pcs *
* the resolved link parameters. For example, a PCS operating in SGMII
* mode without in-band AN needs to be manually configured for the link
* and duplex setting. Otherwise, this should be a no-op.
+ *
+ * The %mode argument should be tested via the phylink_mode_*() family of
+ * functions, or for PCS that set pcs->neg_mode true, should be tested
+ * against the %PHYLINK_PCS_NEG_* definitions.
*/
-void pcs_link_up(struct phylink_pcs *pcs, unsigned int mode,
+void pcs_link_up(struct phylink_pcs *pcs, unsigned int neg_mode,
phy_interface_t interface, int speed, int duplex);
#endif

View File

@ -1,45 +0,0 @@
From cdb08aa0473730315dbc088d5394e59622314034 Mon Sep 17 00:00:00 2001
From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Date: Fri, 16 Jun 2023 13:06:27 +0100
Subject: [PATCH 1/2] net: phylink: convert phylink_mii_c22_pcs_config() to
neg_mode
Use phylink_pcs_neg_mode() for phylink_mii_c22_pcs_config(). This
results in no functional change.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/E1qA8Dj-00EaFG-Mt@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
drivers/net/phy/phylink.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -3558,6 +3558,7 @@ int phylink_mii_c22_pcs_config(struct md
phy_interface_t interface,
const unsigned long *advertising)
{
+ unsigned int neg_mode;
bool changed = 0;
u16 bmcr;
int ret, adv;
@@ -3571,15 +3572,13 @@ int phylink_mii_c22_pcs_config(struct md
changed = ret;
}
- /* Ensure ISOLATE bit is disabled */
- if (mode == MLO_AN_INBAND &&
- (interface == PHY_INTERFACE_MODE_SGMII ||
- interface == PHY_INTERFACE_MODE_QSGMII ||
- linkmode_test_bit(ETHTOOL_LINK_MODE_Autoneg_BIT, advertising)))
+ neg_mode = phylink_pcs_neg_mode(mode, interface, advertising);
+ if (neg_mode == PHYLINK_PCS_NEG_INBAND_ENABLED)
bmcr = BMCR_ANENABLE;
else
bmcr = 0;
+ /* Configure the inband state. Ensure ISOLATE bit is disabled */
ret = mdiodev_modify(pcs, MII_BMCR, BMCR_ANENABLE | BMCR_ISOLATE, bmcr);
if (ret < 0)
return ret;

View File

@ -1,187 +0,0 @@
From febf2aaf05641f3258cc30e072aff65cffc7c82c Mon Sep 17 00:00:00 2001
From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Date: Fri, 16 Jun 2023 13:06:32 +0100
Subject: [PATCH 2/2] net: phylink: pass neg_mode into
phylink_mii_c22_pcs_config()
Convert fman_dtsec, xilinx_axienet and pcs-lynx to pass the neg_mode
into phylink_mii_c22_pcs_config(). Where appropriate, drivers are
updated to have neg_mode passed into their pcs_config() and
pcs_link_up() functions. For other drivers, we just hoist the call
to phylink_pcs_neg_mode() to their pcs_config() method out of
phylink_mii_c22_pcs_config().
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/E1qA8Do-00EaFM-Ra@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
.../net/ethernet/freescale/fman/fman_dtsec.c | 7 ++++---
.../net/ethernet/xilinx/xilinx_axienet_main.c | 6 ++++--
drivers/net/pcs/pcs-lynx.c | 18 ++++++++++++------
drivers/net/phy/phylink.c | 9 ++++-----
include/linux/phylink.h | 5 +++--
5 files changed, 27 insertions(+), 18 deletions(-)
--- a/drivers/net/ethernet/freescale/fman/fman_dtsec.c
+++ b/drivers/net/ethernet/freescale/fman/fman_dtsec.c
@@ -763,15 +763,15 @@ static void dtsec_pcs_get_state(struct p
phylink_mii_c22_pcs_get_state(dtsec->tbidev, state);
}
-static int dtsec_pcs_config(struct phylink_pcs *pcs, unsigned int mode,
+static int dtsec_pcs_config(struct phylink_pcs *pcs, unsigned int neg_mode,
phy_interface_t interface,
const unsigned long *advertising,
bool permit_pause_to_mac)
{
struct fman_mac *dtsec = pcs_to_dtsec(pcs);
- return phylink_mii_c22_pcs_config(dtsec->tbidev, mode, interface,
- advertising);
+ return phylink_mii_c22_pcs_config(dtsec->tbidev, interface,
+ advertising, neg_mode);
}
static void dtsec_pcs_an_restart(struct phylink_pcs *pcs)
@@ -1447,6 +1447,7 @@ int dtsec_initialization(struct mac_devi
goto _return_fm_mac_free;
}
dtsec->pcs.ops = &dtsec_pcs_ops;
+ dtsec->pcs.neg_mode = true;
dtsec->pcs.poll = true;
supported = mac_dev->phylink_config.supported_interfaces;
--- a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
+++ b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
@@ -1631,7 +1631,7 @@ static void axienet_pcs_an_restart(struc
phylink_mii_c22_pcs_an_restart(pcs_phy);
}
-static int axienet_pcs_config(struct phylink_pcs *pcs, unsigned int mode,
+static int axienet_pcs_config(struct phylink_pcs *pcs, unsigned int neg_mode,
phy_interface_t interface,
const unsigned long *advertising,
bool permit_pause_to_mac)
@@ -1653,7 +1653,8 @@ static int axienet_pcs_config(struct phy
}
}
- ret = phylink_mii_c22_pcs_config(pcs_phy, mode, interface, advertising);
+ ret = phylink_mii_c22_pcs_config(pcs_phy, interface, advertising,
+ neg_mode);
if (ret < 0)
netdev_warn(ndev, "Failed to configure PCS: %d\n", ret);
@@ -2129,6 +2130,7 @@ static int axienet_probe(struct platform
}
of_node_put(np);
lp->pcs.ops = &axienet_pcs_ops;
+ lp->pcs.neg_mode = true;
lp->pcs.poll = true;
}
--- a/drivers/net/pcs/pcs-lynx.c
+++ b/drivers/net/pcs/pcs-lynx.c
@@ -122,9 +122,10 @@ static void lynx_pcs_get_state(struct ph
state->link, state->an_complete);
}
-static int lynx_pcs_config_giga(struct mdio_device *pcs, unsigned int mode,
+static int lynx_pcs_config_giga(struct mdio_device *pcs,
phy_interface_t interface,
- const unsigned long *advertising)
+ const unsigned long *advertising,
+ unsigned int neg_mode)
{
u32 link_timer;
u16 if_mode;
@@ -137,8 +138,9 @@ static int lynx_pcs_config_giga(struct m
if_mode = 0;
} else {
+ /* SGMII and QSGMII */
if_mode = IF_MODE_SGMII_EN;
- if (mode == MLO_AN_INBAND) {
+ if (neg_mode == PHYLINK_PCS_NEG_INBAND_ENABLED) {
if_mode |= IF_MODE_USE_SGMII_AN;
/* Adjust link timer for SGMII */
@@ -154,7 +156,8 @@ static int lynx_pcs_config_giga(struct m
if (err)
return err;
- return phylink_mii_c22_pcs_config(pcs, mode, interface, advertising);
+ return phylink_mii_c22_pcs_config(pcs, interface, advertising,
+ neg_mode);
}
static int lynx_pcs_config_usxgmii(struct mdio_device *pcs, unsigned int mode,
@@ -181,13 +184,16 @@ static int lynx_pcs_config(struct phylin
bool permit)
{
struct lynx_pcs *lynx = phylink_pcs_to_lynx(pcs);
+ unsigned int neg_mode;
+
+ neg_mode = phylink_pcs_neg_mode(mode, ifmode, advertising);
switch (ifmode) {
case PHY_INTERFACE_MODE_1000BASEX:
case PHY_INTERFACE_MODE_SGMII:
case PHY_INTERFACE_MODE_QSGMII:
- return lynx_pcs_config_giga(lynx->mdio, mode, ifmode,
- advertising);
+ return lynx_pcs_config_giga(lynx->mdio, ifmode, advertising,
+ neg_mode);
case PHY_INTERFACE_MODE_2500BASEX:
if (phylink_autoneg_inband(mode)) {
dev_err(&lynx->mdio->dev,
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -3545,20 +3545,20 @@ EXPORT_SYMBOL_GPL(phylink_mii_c22_pcs_en
/**
* phylink_mii_c22_pcs_config() - configure clause 22 PCS
* @pcs: a pointer to a &struct mdio_device.
- * @mode: link autonegotiation mode
* @interface: the PHY interface mode being configured
* @advertising: the ethtool advertisement mask
+ * @neg_mode: PCS negotiation mode
*
* Configure a Clause 22 PCS PHY with the appropriate negotiation
* parameters for the @mode, @interface and @advertising parameters.
* Returns negative error number on failure, zero if the advertisement
* has not changed, or positive if there is a change.
*/
-int phylink_mii_c22_pcs_config(struct mdio_device *pcs, unsigned int mode,
+int phylink_mii_c22_pcs_config(struct mdio_device *pcs,
phy_interface_t interface,
- const unsigned long *advertising)
+ const unsigned long *advertising,
+ unsigned int neg_mode)
{
- unsigned int neg_mode;
bool changed = 0;
u16 bmcr;
int ret, adv;
@@ -3572,7 +3572,6 @@ int phylink_mii_c22_pcs_config(struct md
changed = ret;
}
- neg_mode = phylink_pcs_neg_mode(mode, interface, advertising);
if (neg_mode == PHYLINK_PCS_NEG_INBAND_ENABLED)
bmcr = BMCR_ANENABLE;
else
--- a/include/linux/phylink.h
+++ b/include/linux/phylink.h
@@ -743,9 +743,10 @@ void phylink_mii_c22_pcs_get_state(struc
struct phylink_link_state *state);
int phylink_mii_c22_pcs_encode_advertisement(phy_interface_t interface,
const unsigned long *advertising);
-int phylink_mii_c22_pcs_config(struct mdio_device *pcs, unsigned int mode,
+int phylink_mii_c22_pcs_config(struct mdio_device *pcs,
phy_interface_t interface,
- const unsigned long *advertising);
+ const unsigned long *advertising,
+ unsigned int neg_mode);
void phylink_mii_c22_pcs_an_restart(struct mdio_device *pcs);
void phylink_resolve_c73(struct phylink_link_state *state);

View File

@ -1,101 +0,0 @@
From 3b2de56a146f34e3f70a84cc3a1897064e445d16 Mon Sep 17 00:00:00 2001
From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Date: Fri, 16 Jun 2023 13:06:43 +0100
Subject: [PATCH] net: pcs: lynxi: update PCS driver to use neg_mode
Update the Lynxi PCS driver to use neg_mode rather than the mode
argument. This ensures that the link_up() method will always program
the speed and duplex when negotiation is disabled.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/E1qA8Dz-00EaFY-5A@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
drivers/net/pcs/pcs-mtk-lynxi.c | 39 ++++++++++++++-------------------
1 file changed, 16 insertions(+), 23 deletions(-)
--- a/drivers/net/pcs/pcs-mtk-lynxi.c
+++ b/drivers/net/pcs/pcs-mtk-lynxi.c
@@ -102,13 +102,13 @@ static void mtk_pcs_lynxi_get_state(stru
FIELD_GET(SGMII_LPA, adv));
}
-static int mtk_pcs_lynxi_config(struct phylink_pcs *pcs, unsigned int mode,
+static int mtk_pcs_lynxi_config(struct phylink_pcs *pcs, unsigned int neg_mode,
phy_interface_t interface,
const unsigned long *advertising,
bool permit_pause_to_mac)
{
struct mtk_pcs_lynxi *mpcs = pcs_to_mtk_pcs_lynxi(pcs);
- bool mode_changed = false, changed, use_an;
+ bool mode_changed = false, changed;
unsigned int rgc3, sgm_mode, bmcr;
int advertise, link_timer;
@@ -121,30 +121,22 @@ static int mtk_pcs_lynxi_config(struct p
* we assume that fixes it's speed at bitrate = line rate (in
* other words, 1000Mbps or 2500Mbps).
*/
- if (interface == PHY_INTERFACE_MODE_SGMII) {
+ if (interface == PHY_INTERFACE_MODE_SGMII)
sgm_mode = SGMII_IF_MODE_SGMII;
- if (phylink_autoneg_inband(mode)) {
- sgm_mode |= SGMII_REMOTE_FAULT_DIS |
- SGMII_SPEED_DUPLEX_AN;
- use_an = true;
- } else {
- use_an = false;
- }
- } else if (phylink_autoneg_inband(mode)) {
- /* 1000base-X or 2500base-X autoneg */
- sgm_mode = SGMII_REMOTE_FAULT_DIS;
- use_an = linkmode_test_bit(ETHTOOL_LINK_MODE_Autoneg_BIT,
- advertising);
- } else {
+ else
/* 1000base-X or 2500base-X without autoneg */
sgm_mode = 0;
- use_an = false;
- }
- if (use_an)
+ if (neg_mode & PHYLINK_PCS_NEG_INBAND)
+ sgm_mode |= SGMII_REMOTE_FAULT_DIS;
+
+ if (neg_mode == PHYLINK_PCS_NEG_INBAND_ENABLED) {
+ if (interface == PHY_INTERFACE_MODE_SGMII)
+ sgm_mode |= SGMII_SPEED_DUPLEX_AN;
bmcr = BMCR_ANENABLE;
- else
+ } else {
bmcr = 0;
+ }
if (mpcs->interface != interface) {
link_timer = phylink_get_link_timer_ns(interface);
@@ -216,14 +208,15 @@ static void mtk_pcs_lynxi_restart_an(str
regmap_set_bits(mpcs->regmap, SGMSYS_PCS_CONTROL_1, BMCR_ANRESTART);
}
-static void mtk_pcs_lynxi_link_up(struct phylink_pcs *pcs, unsigned int mode,
+static void mtk_pcs_lynxi_link_up(struct phylink_pcs *pcs,
+ unsigned int neg_mode,
phy_interface_t interface, int speed,
int duplex)
{
struct mtk_pcs_lynxi *mpcs = pcs_to_mtk_pcs_lynxi(pcs);
unsigned int sgm_mode;
- if (!phylink_autoneg_inband(mode)) {
+ if (neg_mode != PHYLINK_PCS_NEG_INBAND_ENABLED) {
/* Force the speed and duplex setting */
if (speed == SPEED_10)
sgm_mode = SGMII_SPEED_10;
@@ -286,6 +279,7 @@ struct phylink_pcs *mtk_pcs_lynxi_create
mpcs->regmap = regmap;
mpcs->flags = flags;
mpcs->pcs.ops = &mtk_pcs_lynxi_ops;
+ mpcs->pcs.neg_mode = true;
mpcs->pcs.poll = true;
mpcs->interface = PHY_INTERFACE_MODE_NA;

View File

@ -1,55 +0,0 @@
From 459fd2f11204c962e3153020f4f56748e0e10afb Mon Sep 17 00:00:00 2001
From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Date: Tue, 21 Mar 2023 15:58:49 +0000
Subject: [PATCH] net: pcs: xpcs: use Autoneg bit rather than an_enabled
The Autoneg bit in the advertising bitmap and state->an_enabled are
always identical. Thus, we will be removing state->an_enabled.
Use the Autoneg bit in the advertising bitmap to indicate whether
autonegotiation should be used, rather than using the an_enabled
member which will be going away.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
drivers/net/pcs/pcs-xpcs.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
--- a/drivers/net/pcs/pcs-xpcs.c
+++ b/drivers/net/pcs/pcs-xpcs.c
@@ -931,6 +931,7 @@ static int xpcs_get_state_c73(struct dw_
struct phylink_link_state *state,
const struct xpcs_compat *compat)
{
+ bool an_enabled;
int ret;
/* Link needs to be read first ... */
@@ -948,11 +949,13 @@ static int xpcs_get_state_c73(struct dw_
return xpcs_do_config(xpcs, state->interface, MLO_AN_INBAND, NULL);
}
- if (state->an_enabled && xpcs_aneg_done_c73(xpcs, state, compat)) {
+ an_enabled = linkmode_test_bit(ETHTOOL_LINK_MODE_Autoneg_BIT,
+ state->advertising);
+ if (an_enabled && xpcs_aneg_done_c73(xpcs, state, compat)) {
state->an_complete = true;
xpcs_read_lpa_c73(xpcs, state);
xpcs_resolve_lpa_c73(xpcs, state);
- } else if (state->an_enabled) {
+ } else if (an_enabled) {
state->link = 0;
} else if (state->link) {
xpcs_resolve_pma(xpcs, state);
@@ -1007,7 +1010,8 @@ static int xpcs_get_state_c37_1000basex(
{
int lpa, bmsr;
- if (state->an_enabled) {
+ if (linkmode_test_bit(ETHTOOL_LINK_MODE_Autoneg_BIT,
+ state->advertising)) {
/* Reset link state */
state->link = false;

View File

@ -1,30 +0,0 @@
From 43fb622d91a9f408322735d2f736495c1009f575 Mon Sep 17 00:00:00 2001
From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Date: Tue, 9 May 2023 12:50:04 +0100
Subject: [PATCH] net: pcs: xpcs: fix incorrect number of interfaces
In synopsys_xpcs_compat[], the DW_XPCS_2500BASEX entry was setting
the number of interfaces using the xpcs_2500basex_features array
rather than xpcs_2500basex_interfaces. This causes us to overflow
the array of interfaces. Fix this.
Fixes: f27abde3042a ("net: pcs: add 2500BASEX support for Intel mGbE controller")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/net/pcs/pcs-xpcs.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/pcs/pcs-xpcs.c
+++ b/drivers/net/pcs/pcs-xpcs.c
@@ -1211,7 +1211,7 @@ static const struct xpcs_compat synopsys
[DW_XPCS_2500BASEX] = {
.supported = xpcs_2500basex_features,
.interface = xpcs_2500basex_interfaces,
- .num_interfaces = ARRAY_SIZE(xpcs_2500basex_features),
+ .num_interfaces = ARRAY_SIZE(xpcs_2500basex_interfaces),
.an_mode = DW_2500BASEX,
},
};

View File

@ -1,26 +0,0 @@
From 1b9827ceab08450308f7971d6fd700ec88b6ce67 Mon Sep 17 00:00:00 2001
From: Lorenzo Bianconi <lorenzo@kernel.org>
Date: Sat, 3 Dec 2022 14:20:37 +0100
Subject: [PATCH 098/250] net: mtk_eth_soc: enable flow offload support for
MT7986 SoC
Since Wireless Ethernet Dispatcher is now available for mt7986 in mt76,
enable hw flow support for MT7986 SoC.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/fdcaacd827938e6a8c4aa1ac2c13e46d2c08c821.1670072898.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
drivers/net/ethernet/mediatek/mtk_eth_soc.c | 1 +
1 file changed, 1 insertion(+)
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -4333,6 +4333,7 @@ static const struct mtk_soc_data mt7986_
.hw_features = MTK_HW_FEATURES,
.required_clks = MT7986_CLKS_BITMAP,
.required_pctl = false,
+ .offload_version = 2,
.hash_offset = 4,
.foe_entry_size = sizeof(struct mtk_foe_entry),
.txrx = {

View File

@ -1,591 +0,0 @@
From: Sujuan Chen <sujuan.chen@mediatek.com>
Date: Sat, 5 Nov 2022 23:36:18 +0100
Subject: [PATCH] net: ethernet: mtk_wed: introduce wed mcu support
Introduce WED mcu support used to configure WED WO chip.
This is a preliminary patch in order to add RX Wireless
Ethernet Dispatch available on MT7986 SoC.
Tested-by: Daniel Golle <daniel@makrotopia.org>
Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
create mode 100644 drivers/net/ethernet/mediatek/mtk_wed_mcu.c
create mode 100644 drivers/net/ethernet/mediatek/mtk_wed_wo.h
--- a/drivers/net/ethernet/mediatek/Makefile
+++ b/drivers/net/ethernet/mediatek/Makefile
@@ -5,7 +5,7 @@
obj-$(CONFIG_NET_MEDIATEK_SOC) += mtk_eth.o
mtk_eth-y := mtk_eth_soc.o mtk_sgmii.o mtk_eth_path.o mtk_ppe.o mtk_ppe_debugfs.o mtk_ppe_offload.o
-mtk_eth-$(CONFIG_NET_MEDIATEK_SOC_WED) += mtk_wed.o
+mtk_eth-$(CONFIG_NET_MEDIATEK_SOC_WED) += mtk_wed.o mtk_wed_mcu.o
ifdef CONFIG_DEBUG_FS
mtk_eth-$(CONFIG_NET_MEDIATEK_SOC_WED) += mtk_wed_debugfs.o
endif
--- /dev/null
+++ b/drivers/net/ethernet/mediatek/mtk_wed_mcu.c
@@ -0,0 +1,359 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (C) 2022 MediaTek Inc.
+ *
+ * Author: Lorenzo Bianconi <lorenzo@kernel.org>
+ * Sujuan Chen <sujuan.chen@mediatek.com>
+ */
+
+#include <linux/firmware.h>
+#include <linux/of_address.h>
+#include <linux/of_reserved_mem.h>
+#include <linux/mfd/syscon.h>
+#include <linux/soc/mediatek/mtk_wed.h>
+
+#include "mtk_wed_regs.h"
+#include "mtk_wed_wo.h"
+#include "mtk_wed.h"
+
+static u32 wo_r32(struct mtk_wed_wo *wo, u32 reg)
+{
+ return readl(wo->boot.addr + reg);
+}
+
+static void wo_w32(struct mtk_wed_wo *wo, u32 reg, u32 val)
+{
+ writel(val, wo->boot.addr + reg);
+}
+
+static struct sk_buff *
+mtk_wed_mcu_msg_alloc(const void *data, int data_len)
+{
+ int length = sizeof(struct mtk_wed_mcu_hdr) + data_len;
+ struct sk_buff *skb;
+
+ skb = alloc_skb(length, GFP_KERNEL);
+ if (!skb)
+ return NULL;
+
+ memset(skb->head, 0, length);
+ skb_reserve(skb, sizeof(struct mtk_wed_mcu_hdr));
+ if (data && data_len)
+ skb_put_data(skb, data, data_len);
+
+ return skb;
+}
+
+static struct sk_buff *
+mtk_wed_mcu_get_response(struct mtk_wed_wo *wo, unsigned long expires)
+{
+ if (!time_is_after_jiffies(expires))
+ return NULL;
+
+ wait_event_timeout(wo->mcu.wait, !skb_queue_empty(&wo->mcu.res_q),
+ expires - jiffies);
+ return skb_dequeue(&wo->mcu.res_q);
+}
+
+void mtk_wed_mcu_rx_event(struct mtk_wed_wo *wo, struct sk_buff *skb)
+{
+ skb_queue_tail(&wo->mcu.res_q, skb);
+ wake_up(&wo->mcu.wait);
+}
+
+void mtk_wed_mcu_rx_unsolicited_event(struct mtk_wed_wo *wo,
+ struct sk_buff *skb)
+{
+ struct mtk_wed_mcu_hdr *hdr = (struct mtk_wed_mcu_hdr *)skb->data;
+
+ switch (hdr->cmd) {
+ case MTK_WED_WO_EVT_LOG_DUMP: {
+ const char *msg = (const char *)(skb->data + sizeof(*hdr));
+
+ dev_notice(wo->hw->dev, "%s\n", msg);
+ break;
+ }
+ case MTK_WED_WO_EVT_PROFILING: {
+ struct mtk_wed_wo_log_info *info;
+ u32 count = (skb->len - sizeof(*hdr)) / sizeof(*info);
+ int i;
+
+ info = (struct mtk_wed_wo_log_info *)(skb->data + sizeof(*hdr));
+ for (i = 0 ; i < count ; i++)
+ dev_notice(wo->hw->dev,
+ "SN:%u latency: total=%u, rro:%u, mod:%u\n",
+ le32_to_cpu(info[i].sn),
+ le32_to_cpu(info[i].total),
+ le32_to_cpu(info[i].rro),
+ le32_to_cpu(info[i].mod));
+ break;
+ }
+ case MTK_WED_WO_EVT_RXCNT_INFO:
+ break;
+ default:
+ break;
+ }
+
+ dev_kfree_skb(skb);
+}
+
+static int
+mtk_wed_mcu_skb_send_msg(struct mtk_wed_wo *wo, struct sk_buff *skb,
+ int id, int cmd, u16 *wait_seq, bool wait_resp)
+{
+ struct mtk_wed_mcu_hdr *hdr;
+
+ /* TODO: make it dynamic based on cmd */
+ wo->mcu.timeout = 20 * HZ;
+
+ hdr = (struct mtk_wed_mcu_hdr *)skb_push(skb, sizeof(*hdr));
+ hdr->cmd = cmd;
+ hdr->length = cpu_to_le16(skb->len);
+
+ if (wait_resp && wait_seq) {
+ u16 seq = ++wo->mcu.seq;
+
+ if (!seq)
+ seq = ++wo->mcu.seq;
+ *wait_seq = seq;
+
+ hdr->flag |= cpu_to_le16(MTK_WED_WARP_CMD_FLAG_NEED_RSP);
+ hdr->seq = cpu_to_le16(seq);
+ }
+ if (id == MTK_WED_MODULE_ID_WO)
+ hdr->flag |= cpu_to_le16(MTK_WED_WARP_CMD_FLAG_FROM_TO_WO);
+
+ dev_kfree_skb(skb);
+ return 0;
+}
+
+static int
+mtk_wed_mcu_parse_response(struct mtk_wed_wo *wo, struct sk_buff *skb,
+ int cmd, int seq)
+{
+ struct mtk_wed_mcu_hdr *hdr;
+
+ if (!skb) {
+ dev_err(wo->hw->dev, "Message %08x (seq %d) timeout\n",
+ cmd, seq);
+ return -ETIMEDOUT;
+ }
+
+ hdr = (struct mtk_wed_mcu_hdr *)skb->data;
+ if (le16_to_cpu(hdr->seq) != seq)
+ return -EAGAIN;
+
+ skb_pull(skb, sizeof(*hdr));
+ switch (cmd) {
+ case MTK_WED_WO_CMD_RXCNT_INFO:
+ default:
+ break;
+ }
+
+ return 0;
+}
+
+int mtk_wed_mcu_send_msg(struct mtk_wed_wo *wo, int id, int cmd,
+ const void *data, int len, bool wait_resp)
+{
+ unsigned long expires;
+ struct sk_buff *skb;
+ u16 seq;
+ int ret;
+
+ skb = mtk_wed_mcu_msg_alloc(data, len);
+ if (!skb)
+ return -ENOMEM;
+
+ mutex_lock(&wo->mcu.mutex);
+
+ ret = mtk_wed_mcu_skb_send_msg(wo, skb, id, cmd, &seq, wait_resp);
+ if (ret || !wait_resp)
+ goto unlock;
+
+ expires = jiffies + wo->mcu.timeout;
+ do {
+ skb = mtk_wed_mcu_get_response(wo, expires);
+ ret = mtk_wed_mcu_parse_response(wo, skb, cmd, seq);
+ dev_kfree_skb(skb);
+ } while (ret == -EAGAIN);
+
+unlock:
+ mutex_unlock(&wo->mcu.mutex);
+
+ return ret;
+}
+
+static int
+mtk_wed_get_memory_region(struct mtk_wed_wo *wo,
+ struct mtk_wed_wo_memory_region *region)
+{
+ struct reserved_mem *rmem;
+ struct device_node *np;
+ int index;
+
+ index = of_property_match_string(wo->hw->node, "memory-region-names",
+ region->name);
+ if (index < 0)
+ return index;
+
+ np = of_parse_phandle(wo->hw->node, "memory-region", index);
+ if (!np)
+ return -ENODEV;
+
+ rmem = of_reserved_mem_lookup(np);
+ of_node_put(np);
+
+ if (!rmem)
+ return -ENODEV;
+
+ region->phy_addr = rmem->base;
+ region->size = rmem->size;
+ region->addr = devm_ioremap(wo->hw->dev, region->phy_addr, region->size);
+
+ return !region->addr ? -EINVAL : 0;
+}
+
+static int
+mtk_wed_mcu_run_firmware(struct mtk_wed_wo *wo, const struct firmware *fw,
+ struct mtk_wed_wo_memory_region *region)
+{
+ const u8 *first_region_ptr, *region_ptr, *trailer_ptr, *ptr = fw->data;
+ const struct mtk_wed_fw_trailer *trailer;
+ const struct mtk_wed_fw_region *fw_region;
+
+ trailer_ptr = fw->data + fw->size - sizeof(*trailer);
+ trailer = (const struct mtk_wed_fw_trailer *)trailer_ptr;
+ region_ptr = trailer_ptr - trailer->num_region * sizeof(*fw_region);
+ first_region_ptr = region_ptr;
+
+ while (region_ptr < trailer_ptr) {
+ u32 length;
+
+ fw_region = (const struct mtk_wed_fw_region *)region_ptr;
+ length = le32_to_cpu(fw_region->len);
+
+ if (region->phy_addr != le32_to_cpu(fw_region->addr))
+ goto next;
+
+ if (region->size < length)
+ goto next;
+
+ if (first_region_ptr < ptr + length)
+ goto next;
+
+ if (region->shared && region->consumed)
+ return 0;
+
+ if (!region->shared || !region->consumed) {
+ memcpy_toio(region->addr, ptr, length);
+ region->consumed = true;
+ return 0;
+ }
+next:
+ region_ptr += sizeof(*fw_region);
+ ptr += length;
+ }
+
+ return -EINVAL;
+}
+
+static int
+mtk_wed_mcu_load_firmware(struct mtk_wed_wo *wo)
+{
+ static struct mtk_wed_wo_memory_region mem_region[] = {
+ [MTK_WED_WO_REGION_EMI] = {
+ .name = "wo-emi",
+ },
+ [MTK_WED_WO_REGION_ILM] = {
+ .name = "wo-ilm",
+ },
+ [MTK_WED_WO_REGION_DATA] = {
+ .name = "wo-data",
+ .shared = true,
+ },
+ };
+ const struct mtk_wed_fw_trailer *trailer;
+ const struct firmware *fw;
+ const char *fw_name;
+ u32 val, boot_cr;
+ int ret, i;
+
+ /* load firmware region metadata */
+ for (i = 0; i < ARRAY_SIZE(mem_region); i++) {
+ ret = mtk_wed_get_memory_region(wo, &mem_region[i]);
+ if (ret)
+ return ret;
+ }
+
+ wo->boot.name = "wo-boot";
+ ret = mtk_wed_get_memory_region(wo, &wo->boot);
+ if (ret)
+ return ret;
+
+ /* set dummy cr */
+ wed_w32(wo->hw->wed_dev, MTK_WED_SCR0 + 4 * MTK_WED_DUMMY_CR_FWDL,
+ wo->hw->index + 1);
+
+ /* load firmware */
+ fw_name = wo->hw->index ? MT7986_FIRMWARE_WO1 : MT7986_FIRMWARE_WO0;
+ ret = request_firmware(&fw, fw_name, wo->hw->dev);
+ if (ret)
+ return ret;
+
+ trailer = (void *)(fw->data + fw->size -
+ sizeof(struct mtk_wed_fw_trailer));
+ dev_info(wo->hw->dev,
+ "MTK WED WO Firmware Version: %.10s, Build Time: %.15s\n",
+ trailer->fw_ver, trailer->build_date);
+ dev_info(wo->hw->dev, "MTK WED WO Chip ID %02x Region %d\n",
+ trailer->chip_id, trailer->num_region);
+
+ for (i = 0; i < ARRAY_SIZE(mem_region); i++) {
+ ret = mtk_wed_mcu_run_firmware(wo, fw, &mem_region[i]);
+ if (ret)
+ goto out;
+ }
+
+ /* set the start address */
+ boot_cr = wo->hw->index ? MTK_WO_MCU_CFG_LS_WA_BOOT_ADDR_ADDR
+ : MTK_WO_MCU_CFG_LS_WM_BOOT_ADDR_ADDR;
+ wo_w32(wo, boot_cr, mem_region[MTK_WED_WO_REGION_EMI].phy_addr >> 16);
+ /* wo firmware reset */
+ wo_w32(wo, MTK_WO_MCU_CFG_LS_WF_MCCR_CLR_ADDR, 0xc00);
+
+ val = wo_r32(wo, MTK_WO_MCU_CFG_LS_WF_MCU_CFG_WM_WA_ADDR);
+ val |= wo->hw->index ? MTK_WO_MCU_CFG_LS_WF_WM_WA_WA_CPU_RSTB_MASK
+ : MTK_WO_MCU_CFG_LS_WF_WM_WA_WM_CPU_RSTB_MASK;
+ wo_w32(wo, MTK_WO_MCU_CFG_LS_WF_MCU_CFG_WM_WA_ADDR, val);
+out:
+ release_firmware(fw);
+
+ return ret;
+}
+
+static u32
+mtk_wed_mcu_read_fw_dl(struct mtk_wed_wo *wo)
+{
+ return wed_r32(wo->hw->wed_dev,
+ MTK_WED_SCR0 + 4 * MTK_WED_DUMMY_CR_FWDL);
+}
+
+int mtk_wed_mcu_init(struct mtk_wed_wo *wo)
+{
+ u32 val;
+ int ret;
+
+ skb_queue_head_init(&wo->mcu.res_q);
+ init_waitqueue_head(&wo->mcu.wait);
+ mutex_init(&wo->mcu.mutex);
+
+ ret = mtk_wed_mcu_load_firmware(wo);
+ if (ret)
+ return ret;
+
+ return readx_poll_timeout(mtk_wed_mcu_read_fw_dl, wo, val, !val,
+ 100, MTK_FW_DL_TIMEOUT);
+}
+
+MODULE_FIRMWARE(MT7986_FIRMWARE_WO0);
+MODULE_FIRMWARE(MT7986_FIRMWARE_WO1);
--- a/drivers/net/ethernet/mediatek/mtk_wed_regs.h
+++ b/drivers/net/ethernet/mediatek/mtk_wed_regs.h
@@ -152,6 +152,7 @@ struct mtk_wdma_desc {
#define MTK_WED_RING_RX(_n) (0x400 + (_n) * 0x10)
+#define MTK_WED_SCR0 0x3c0
#define MTK_WED_WPDMA_INT_TRIGGER 0x504
#define MTK_WED_WPDMA_INT_TRIGGER_RX_DONE BIT(1)
#define MTK_WED_WPDMA_INT_TRIGGER_TX_DONE GENMASK(5, 4)
--- /dev/null
+++ b/drivers/net/ethernet/mediatek/mtk_wed_wo.h
@@ -0,0 +1,150 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (C) 2022 Lorenzo Bianconi <lorenzo@kernel.org> */
+
+#ifndef __MTK_WED_WO_H
+#define __MTK_WED_WO_H
+
+#include <linux/skbuff.h>
+#include <linux/netdevice.h>
+
+struct mtk_wed_hw;
+
+struct mtk_wed_mcu_hdr {
+ /* DW0 */
+ u8 version;
+ u8 cmd;
+ __le16 length;
+
+ /* DW1 */
+ __le16 seq;
+ __le16 flag;
+
+ /* DW2 */
+ __le32 status;
+
+ /* DW3 */
+ u8 rsv[20];
+};
+
+struct mtk_wed_wo_log_info {
+ __le32 sn;
+ __le32 total;
+ __le32 rro;
+ __le32 mod;
+};
+
+enum mtk_wed_wo_event {
+ MTK_WED_WO_EVT_LOG_DUMP = 0x1,
+ MTK_WED_WO_EVT_PROFILING = 0x2,
+ MTK_WED_WO_EVT_RXCNT_INFO = 0x3,
+};
+
+#define MTK_WED_MODULE_ID_WO 1
+#define MTK_FW_DL_TIMEOUT 4000000 /* us */
+#define MTK_WOCPU_TIMEOUT 2000000 /* us */
+
+enum {
+ MTK_WED_WARP_CMD_FLAG_RSP = BIT(0),
+ MTK_WED_WARP_CMD_FLAG_NEED_RSP = BIT(1),
+ MTK_WED_WARP_CMD_FLAG_FROM_TO_WO = BIT(2),
+};
+
+enum {
+ MTK_WED_WO_REGION_EMI,
+ MTK_WED_WO_REGION_ILM,
+ MTK_WED_WO_REGION_DATA,
+ MTK_WED_WO_REGION_BOOT,
+ __MTK_WED_WO_REGION_MAX,
+};
+
+enum mtk_wed_dummy_cr_idx {
+ MTK_WED_DUMMY_CR_FWDL,
+ MTK_WED_DUMMY_CR_WO_STATUS,
+};
+
+#define MT7986_FIRMWARE_WO0 "mediatek/mt7986_wo_0.bin"
+#define MT7986_FIRMWARE_WO1 "mediatek/mt7986_wo_1.bin"
+
+#define MTK_WO_MCU_CFG_LS_BASE 0
+#define MTK_WO_MCU_CFG_LS_HW_VER_ADDR (MTK_WO_MCU_CFG_LS_BASE + 0x000)
+#define MTK_WO_MCU_CFG_LS_FW_VER_ADDR (MTK_WO_MCU_CFG_LS_BASE + 0x004)
+#define MTK_WO_MCU_CFG_LS_CFG_DBG1_ADDR (MTK_WO_MCU_CFG_LS_BASE + 0x00c)
+#define MTK_WO_MCU_CFG_LS_CFG_DBG2_ADDR (MTK_WO_MCU_CFG_LS_BASE + 0x010)
+#define MTK_WO_MCU_CFG_LS_WF_MCCR_ADDR (MTK_WO_MCU_CFG_LS_BASE + 0x014)
+#define MTK_WO_MCU_CFG_LS_WF_MCCR_SET_ADDR (MTK_WO_MCU_CFG_LS_BASE + 0x018)
+#define MTK_WO_MCU_CFG_LS_WF_MCCR_CLR_ADDR (MTK_WO_MCU_CFG_LS_BASE + 0x01c)
+#define MTK_WO_MCU_CFG_LS_WF_MCU_CFG_WM_WA_ADDR (MTK_WO_MCU_CFG_LS_BASE + 0x050)
+#define MTK_WO_MCU_CFG_LS_WM_BOOT_ADDR_ADDR (MTK_WO_MCU_CFG_LS_BASE + 0x060)
+#define MTK_WO_MCU_CFG_LS_WA_BOOT_ADDR_ADDR (MTK_WO_MCU_CFG_LS_BASE + 0x064)
+
+#define MTK_WO_MCU_CFG_LS_WF_WM_WA_WM_CPU_RSTB_MASK BIT(5)
+#define MTK_WO_MCU_CFG_LS_WF_WM_WA_WA_CPU_RSTB_MASK BIT(0)
+
+struct mtk_wed_wo_memory_region {
+ const char *name;
+ void __iomem *addr;
+ phys_addr_t phy_addr;
+ u32 size;
+ bool shared:1;
+ bool consumed:1;
+};
+
+struct mtk_wed_fw_region {
+ __le32 decomp_crc;
+ __le32 decomp_len;
+ __le32 decomp_blk_sz;
+ u8 rsv0[4];
+ __le32 addr;
+ __le32 len;
+ u8 feature_set;
+ u8 rsv1[15];
+} __packed;
+
+struct mtk_wed_fw_trailer {
+ u8 chip_id;
+ u8 eco_code;
+ u8 num_region;
+ u8 format_ver;
+ u8 format_flag;
+ u8 rsv[2];
+ char fw_ver[10];
+ char build_date[15];
+ u32 crc;
+};
+
+struct mtk_wed_wo {
+ struct mtk_wed_hw *hw;
+ struct mtk_wed_wo_memory_region boot;
+
+ struct {
+ struct mutex mutex;
+ int timeout;
+ u16 seq;
+
+ struct sk_buff_head res_q;
+ wait_queue_head_t wait;
+ } mcu;
+};
+
+static inline int
+mtk_wed_mcu_check_msg(struct mtk_wed_wo *wo, struct sk_buff *skb)
+{
+ struct mtk_wed_mcu_hdr *hdr = (struct mtk_wed_mcu_hdr *)skb->data;
+
+ if (hdr->version)
+ return -EINVAL;
+
+ if (skb->len < sizeof(*hdr) || skb->len != le16_to_cpu(hdr->length))
+ return -EINVAL;
+
+ return 0;
+}
+
+void mtk_wed_mcu_rx_event(struct mtk_wed_wo *wo, struct sk_buff *skb);
+void mtk_wed_mcu_rx_unsolicited_event(struct mtk_wed_wo *wo,
+ struct sk_buff *skb);
+int mtk_wed_mcu_send_msg(struct mtk_wed_wo *wo, int id, int cmd,
+ const void *data, int len, bool wait_resp);
+int mtk_wed_mcu_init(struct mtk_wed_wo *wo);
+
+#endif /* __MTK_WED_WO_H */
--- a/include/linux/soc/mediatek/mtk_wed.h
+++ b/include/linux/soc/mediatek/mtk_wed.h
@@ -11,6 +11,35 @@
struct mtk_wed_hw;
struct mtk_wdma_desc;
+enum mtk_wed_wo_cmd {
+ MTK_WED_WO_CMD_WED_CFG,
+ MTK_WED_WO_CMD_WED_RX_STAT,
+ MTK_WED_WO_CMD_RRO_SER,
+ MTK_WED_WO_CMD_DBG_INFO,
+ MTK_WED_WO_CMD_DEV_INFO,
+ MTK_WED_WO_CMD_BSS_INFO,
+ MTK_WED_WO_CMD_STA_REC,
+ MTK_WED_WO_CMD_DEV_INFO_DUMP,
+ MTK_WED_WO_CMD_BSS_INFO_DUMP,
+ MTK_WED_WO_CMD_STA_REC_DUMP,
+ MTK_WED_WO_CMD_BA_INFO_DUMP,
+ MTK_WED_WO_CMD_FBCMD_Q_DUMP,
+ MTK_WED_WO_CMD_FW_LOG_CTRL,
+ MTK_WED_WO_CMD_LOG_FLUSH,
+ MTK_WED_WO_CMD_CHANGE_STATE,
+ MTK_WED_WO_CMD_CPU_STATS_ENABLE,
+ MTK_WED_WO_CMD_CPU_STATS_DUMP,
+ MTK_WED_WO_CMD_EXCEPTION_INIT,
+ MTK_WED_WO_CMD_PROF_CTRL,
+ MTK_WED_WO_CMD_STA_BA_DUMP,
+ MTK_WED_WO_CMD_BA_CTRL_DUMP,
+ MTK_WED_WO_CMD_RXCNT_CTRL,
+ MTK_WED_WO_CMD_RXCNT_INFO,
+ MTK_WED_WO_CMD_SET_CAP,
+ MTK_WED_WO_CMD_CCIF_RING_DUMP,
+ MTK_WED_WO_CMD_WED_END
+};
+
enum mtk_wed_bus_tye {
MTK_WED_BUS_PCIE,
MTK_WED_BUS_AXI,

View File

@ -1,737 +0,0 @@
From: Lorenzo Bianconi <lorenzo@kernel.org>
Date: Sat, 5 Nov 2022 23:36:19 +0100
Subject: [PATCH] net: ethernet: mtk_wed: introduce wed wo support
Introduce WO chip support to mtk wed driver. MTK WED WO is used to
implement RX Wireless Ethernet Dispatch and offload traffic received by
wlan nic to the wired interface.
Tested-by: Daniel Golle <daniel@makrotopia.org>
Co-developed-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
create mode 100644 drivers/net/ethernet/mediatek/mtk_wed_wo.c
--- a/drivers/net/ethernet/mediatek/Makefile
+++ b/drivers/net/ethernet/mediatek/Makefile
@@ -5,7 +5,7 @@
obj-$(CONFIG_NET_MEDIATEK_SOC) += mtk_eth.o
mtk_eth-y := mtk_eth_soc.o mtk_sgmii.o mtk_eth_path.o mtk_ppe.o mtk_ppe_debugfs.o mtk_ppe_offload.o
-mtk_eth-$(CONFIG_NET_MEDIATEK_SOC_WED) += mtk_wed.o mtk_wed_mcu.o
+mtk_eth-$(CONFIG_NET_MEDIATEK_SOC_WED) += mtk_wed.o mtk_wed_mcu.o mtk_wed_wo.o
ifdef CONFIG_DEBUG_FS
mtk_eth-$(CONFIG_NET_MEDIATEK_SOC_WED) += mtk_wed_debugfs.o
endif
--- a/drivers/net/ethernet/mediatek/mtk_wed.c
+++ b/drivers/net/ethernet/mediatek/mtk_wed.c
@@ -16,6 +16,7 @@
#include "mtk_wed_regs.h"
#include "mtk_wed.h"
#include "mtk_ppe.h"
+#include "mtk_wed_wo.h"
#define MTK_PCIE_BASE(n) (0x1a143000 + (n) * 0x2000)
@@ -355,6 +356,8 @@ mtk_wed_detach(struct mtk_wed_device *de
mtk_wed_free_buffer(dev);
mtk_wed_free_tx_rings(dev);
+ if (hw->version != 1)
+ mtk_wed_wo_deinit(hw);
if (dev->wlan.bus_type == MTK_WED_BUS_PCIE) {
struct device_node *wlan_node;
@@ -878,9 +881,11 @@ mtk_wed_attach(struct mtk_wed_device *de
}
mtk_wed_hw_init_early(dev);
- if (hw->hifsys)
+ if (hw->version == 1)
regmap_update_bits(hw->hifsys, HIFSYS_DMA_AG_MAP,
BIT(hw->index), 0);
+ else
+ ret = mtk_wed_wo_init(hw);
out:
mutex_unlock(&hw_lock);
--- a/drivers/net/ethernet/mediatek/mtk_wed.h
+++ b/drivers/net/ethernet/mediatek/mtk_wed.h
@@ -10,6 +10,7 @@
#include <linux/netdevice.h>
struct mtk_eth;
+struct mtk_wed_wo;
struct mtk_wed_hw {
struct device_node *node;
@@ -22,6 +23,7 @@ struct mtk_wed_hw {
struct regmap *mirror;
struct dentry *debugfs_dir;
struct mtk_wed_device *wed_dev;
+ struct mtk_wed_wo *wed_wo;
u32 debugfs_reg;
u32 num_flows;
u8 version;
--- a/drivers/net/ethernet/mediatek/mtk_wed_mcu.c
+++ b/drivers/net/ethernet/mediatek/mtk_wed_mcu.c
@@ -122,8 +122,7 @@ mtk_wed_mcu_skb_send_msg(struct mtk_wed_
if (id == MTK_WED_MODULE_ID_WO)
hdr->flag |= cpu_to_le16(MTK_WED_WARP_CMD_FLAG_FROM_TO_WO);
- dev_kfree_skb(skb);
- return 0;
+ return mtk_wed_wo_queue_tx_skb(wo, &wo->q_tx, skb);
}
static int
--- /dev/null
+++ b/drivers/net/ethernet/mediatek/mtk_wed_wo.c
@@ -0,0 +1,508 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (C) 2022 MediaTek Inc.
+ *
+ * Author: Lorenzo Bianconi <lorenzo@kernel.org>
+ * Sujuan Chen <sujuan.chen@mediatek.com>
+ */
+
+#include <linux/kernel.h>
+#include <linux/dma-mapping.h>
+#include <linux/of_platform.h>
+#include <linux/interrupt.h>
+#include <linux/of_address.h>
+#include <linux/mfd/syscon.h>
+#include <linux/of_irq.h>
+#include <linux/bitfield.h>
+
+#include "mtk_wed.h"
+#include "mtk_wed_regs.h"
+#include "mtk_wed_wo.h"
+
+static u32
+mtk_wed_mmio_r32(struct mtk_wed_wo *wo, u32 reg)
+{
+ u32 val;
+
+ if (regmap_read(wo->mmio.regs, reg, &val))
+ val = ~0;
+
+ return val;
+}
+
+static void
+mtk_wed_mmio_w32(struct mtk_wed_wo *wo, u32 reg, u32 val)
+{
+ regmap_write(wo->mmio.regs, reg, val);
+}
+
+static u32
+mtk_wed_wo_get_isr(struct mtk_wed_wo *wo)
+{
+ u32 val = mtk_wed_mmio_r32(wo, MTK_WED_WO_CCIF_RCHNUM);
+
+ return val & MTK_WED_WO_CCIF_RCHNUM_MASK;
+}
+
+static void
+mtk_wed_wo_set_isr(struct mtk_wed_wo *wo, u32 mask)
+{
+ mtk_wed_mmio_w32(wo, MTK_WED_WO_CCIF_IRQ0_MASK, mask);
+}
+
+static void
+mtk_wed_wo_set_ack(struct mtk_wed_wo *wo, u32 mask)
+{
+ mtk_wed_mmio_w32(wo, MTK_WED_WO_CCIF_ACK, mask);
+}
+
+static void
+mtk_wed_wo_set_isr_mask(struct mtk_wed_wo *wo, u32 mask, u32 val, bool set)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&wo->mmio.lock, flags);
+ wo->mmio.irq_mask &= ~mask;
+ wo->mmio.irq_mask |= val;
+ if (set)
+ mtk_wed_wo_set_isr(wo, wo->mmio.irq_mask);
+ spin_unlock_irqrestore(&wo->mmio.lock, flags);
+}
+
+static void
+mtk_wed_wo_irq_enable(struct mtk_wed_wo *wo, u32 mask)
+{
+ mtk_wed_wo_set_isr_mask(wo, 0, mask, false);
+ tasklet_schedule(&wo->mmio.irq_tasklet);
+}
+
+static void
+mtk_wed_wo_irq_disable(struct mtk_wed_wo *wo, u32 mask)
+{
+ mtk_wed_wo_set_isr_mask(wo, mask, 0, true);
+}
+
+static void
+mtk_wed_wo_kickout(struct mtk_wed_wo *wo)
+{
+ mtk_wed_mmio_w32(wo, MTK_WED_WO_CCIF_BUSY, 1 << MTK_WED_WO_TXCH_NUM);
+ mtk_wed_mmio_w32(wo, MTK_WED_WO_CCIF_TCHNUM, MTK_WED_WO_TXCH_NUM);
+}
+
+static void
+mtk_wed_wo_queue_kick(struct mtk_wed_wo *wo, struct mtk_wed_wo_queue *q,
+ u32 val)
+{
+ wmb();
+ mtk_wed_mmio_w32(wo, q->regs.cpu_idx, val);
+}
+
+static void *
+mtk_wed_wo_dequeue(struct mtk_wed_wo *wo, struct mtk_wed_wo_queue *q, u32 *len,
+ bool flush)
+{
+ int buf_len = SKB_WITH_OVERHEAD(q->buf_size);
+ int index = (q->tail + 1) % q->n_desc;
+ struct mtk_wed_wo_queue_entry *entry;
+ struct mtk_wed_wo_queue_desc *desc;
+ void *buf;
+
+ if (!q->queued)
+ return NULL;
+
+ if (flush)
+ q->desc[index].ctrl |= cpu_to_le32(MTK_WED_WO_CTL_DMA_DONE);
+ else if (!(q->desc[index].ctrl & cpu_to_le32(MTK_WED_WO_CTL_DMA_DONE)))
+ return NULL;
+
+ q->tail = index;
+ q->queued--;
+
+ desc = &q->desc[index];
+ entry = &q->entry[index];
+ buf = entry->buf;
+ if (len)
+ *len = FIELD_GET(MTK_WED_WO_CTL_SD_LEN0,
+ le32_to_cpu(READ_ONCE(desc->ctrl)));
+ if (buf)
+ dma_unmap_single(wo->hw->dev, entry->addr, buf_len,
+ DMA_FROM_DEVICE);
+ entry->buf = NULL;
+
+ return buf;
+}
+
+static int
+mtk_wed_wo_queue_refill(struct mtk_wed_wo *wo, struct mtk_wed_wo_queue *q,
+ gfp_t gfp, bool rx)
+{
+ enum dma_data_direction dir = rx ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
+ int n_buf = 0;
+
+ spin_lock_bh(&q->lock);
+ while (q->queued < q->n_desc) {
+ void *buf = page_frag_alloc(&q->cache, q->buf_size, gfp);
+ struct mtk_wed_wo_queue_entry *entry;
+ dma_addr_t addr;
+
+ if (!buf)
+ break;
+
+ addr = dma_map_single(wo->hw->dev, buf, q->buf_size, dir);
+ if (unlikely(dma_mapping_error(wo->hw->dev, addr))) {
+ skb_free_frag(buf);
+ break;
+ }
+
+ q->head = (q->head + 1) % q->n_desc;
+ entry = &q->entry[q->head];
+ entry->addr = addr;
+ entry->len = q->buf_size;
+ q->entry[q->head].buf = buf;
+
+ if (rx) {
+ struct mtk_wed_wo_queue_desc *desc = &q->desc[q->head];
+ u32 ctrl = MTK_WED_WO_CTL_LAST_SEC0 |
+ FIELD_PREP(MTK_WED_WO_CTL_SD_LEN0,
+ entry->len);
+
+ WRITE_ONCE(desc->buf0, cpu_to_le32(addr));
+ WRITE_ONCE(desc->ctrl, cpu_to_le32(ctrl));
+ }
+ q->queued++;
+ n_buf++;
+ }
+ spin_unlock_bh(&q->lock);
+
+ return n_buf;
+}
+
+static void
+mtk_wed_wo_rx_complete(struct mtk_wed_wo *wo)
+{
+ mtk_wed_wo_set_ack(wo, MTK_WED_WO_RXCH_INT_MASK);
+ mtk_wed_wo_irq_enable(wo, MTK_WED_WO_RXCH_INT_MASK);
+}
+
+static void
+mtk_wed_wo_rx_run_queue(struct mtk_wed_wo *wo, struct mtk_wed_wo_queue *q)
+{
+ for (;;) {
+ struct mtk_wed_mcu_hdr *hdr;
+ struct sk_buff *skb;
+ void *data;
+ u32 len;
+
+ data = mtk_wed_wo_dequeue(wo, q, &len, false);
+ if (!data)
+ break;
+
+ skb = build_skb(data, q->buf_size);
+ if (!skb) {
+ skb_free_frag(data);
+ continue;
+ }
+
+ __skb_put(skb, len);
+ if (mtk_wed_mcu_check_msg(wo, skb)) {
+ dev_kfree_skb(skb);
+ continue;
+ }
+
+ hdr = (struct mtk_wed_mcu_hdr *)skb->data;
+ if (hdr->flag & cpu_to_le16(MTK_WED_WARP_CMD_FLAG_RSP))
+ mtk_wed_mcu_rx_event(wo, skb);
+ else
+ mtk_wed_mcu_rx_unsolicited_event(wo, skb);
+ }
+
+ if (mtk_wed_wo_queue_refill(wo, q, GFP_ATOMIC, true)) {
+ u32 index = (q->head - 1) % q->n_desc;
+
+ mtk_wed_wo_queue_kick(wo, q, index);
+ }
+}
+
+static irqreturn_t
+mtk_wed_wo_irq_handler(int irq, void *data)
+{
+ struct mtk_wed_wo *wo = data;
+
+ mtk_wed_wo_set_isr(wo, 0);
+ tasklet_schedule(&wo->mmio.irq_tasklet);
+
+ return IRQ_HANDLED;
+}
+
+static void mtk_wed_wo_irq_tasklet(struct tasklet_struct *t)
+{
+ struct mtk_wed_wo *wo = from_tasklet(wo, t, mmio.irq_tasklet);
+ u32 intr, mask;
+
+ /* disable interrupts */
+ mtk_wed_wo_set_isr(wo, 0);
+
+ intr = mtk_wed_wo_get_isr(wo);
+ intr &= wo->mmio.irq_mask;
+ mask = intr & (MTK_WED_WO_RXCH_INT_MASK | MTK_WED_WO_EXCEPTION_INT_MASK);
+ mtk_wed_wo_irq_disable(wo, mask);
+
+ if (intr & MTK_WED_WO_RXCH_INT_MASK) {
+ mtk_wed_wo_rx_run_queue(wo, &wo->q_rx);
+ mtk_wed_wo_rx_complete(wo);
+ }
+}
+
+/* mtk wed wo hw queues */
+
+static int
+mtk_wed_wo_queue_alloc(struct mtk_wed_wo *wo, struct mtk_wed_wo_queue *q,
+ int n_desc, int buf_size, int index,
+ struct mtk_wed_wo_queue_regs *regs)
+{
+ spin_lock_init(&q->lock);
+ q->regs = *regs;
+ q->n_desc = n_desc;
+ q->buf_size = buf_size;
+
+ q->desc = dmam_alloc_coherent(wo->hw->dev, n_desc * sizeof(*q->desc),
+ &q->desc_dma, GFP_KERNEL);
+ if (!q->desc)
+ return -ENOMEM;
+
+ q->entry = devm_kzalloc(wo->hw->dev, n_desc * sizeof(*q->entry),
+ GFP_KERNEL);
+ if (!q->entry)
+ return -ENOMEM;
+
+ return 0;
+}
+
+static void
+mtk_wed_wo_queue_free(struct mtk_wed_wo *wo, struct mtk_wed_wo_queue *q)
+{
+ mtk_wed_mmio_w32(wo, q->regs.cpu_idx, 0);
+ dma_free_coherent(wo->hw->dev, q->n_desc * sizeof(*q->desc), q->desc,
+ q->desc_dma);
+}
+
+static void
+mtk_wed_wo_queue_tx_clean(struct mtk_wed_wo *wo, struct mtk_wed_wo_queue *q)
+{
+ struct page *page;
+ int i;
+
+ spin_lock_bh(&q->lock);
+ for (i = 0; i < q->n_desc; i++) {
+ struct mtk_wed_wo_queue_entry *entry = &q->entry[i];
+
+ dma_unmap_single(wo->hw->dev, entry->addr, entry->len,
+ DMA_TO_DEVICE);
+ skb_free_frag(entry->buf);
+ entry->buf = NULL;
+ }
+ spin_unlock_bh(&q->lock);
+
+ if (!q->cache.va)
+ return;
+
+ page = virt_to_page(q->cache.va);
+ __page_frag_cache_drain(page, q->cache.pagecnt_bias);
+ memset(&q->cache, 0, sizeof(q->cache));
+}
+
+static void
+mtk_wed_wo_queue_rx_clean(struct mtk_wed_wo *wo, struct mtk_wed_wo_queue *q)
+{
+ struct page *page;
+
+ spin_lock_bh(&q->lock);
+ for (;;) {
+ void *buf = mtk_wed_wo_dequeue(wo, q, NULL, true);
+
+ if (!buf)
+ break;
+
+ skb_free_frag(buf);
+ }
+ spin_unlock_bh(&q->lock);
+
+ if (!q->cache.va)
+ return;
+
+ page = virt_to_page(q->cache.va);
+ __page_frag_cache_drain(page, q->cache.pagecnt_bias);
+ memset(&q->cache, 0, sizeof(q->cache));
+}
+
+static void
+mtk_wed_wo_queue_reset(struct mtk_wed_wo *wo, struct mtk_wed_wo_queue *q)
+{
+ mtk_wed_mmio_w32(wo, q->regs.cpu_idx, 0);
+ mtk_wed_mmio_w32(wo, q->regs.desc_base, q->desc_dma);
+ mtk_wed_mmio_w32(wo, q->regs.ring_size, q->n_desc);
+}
+
+int mtk_wed_wo_queue_tx_skb(struct mtk_wed_wo *wo, struct mtk_wed_wo_queue *q,
+ struct sk_buff *skb)
+{
+ struct mtk_wed_wo_queue_entry *entry;
+ struct mtk_wed_wo_queue_desc *desc;
+ int ret = 0, index;
+ u32 ctrl;
+
+ spin_lock_bh(&q->lock);
+
+ q->tail = mtk_wed_mmio_r32(wo, q->regs.dma_idx);
+ index = (q->head + 1) % q->n_desc;
+ if (q->tail == index) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ entry = &q->entry[index];
+ if (skb->len > entry->len) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ desc = &q->desc[index];
+ q->head = index;
+
+ dma_sync_single_for_cpu(wo->hw->dev, entry->addr, skb->len,
+ DMA_TO_DEVICE);
+ memcpy(entry->buf, skb->data, skb->len);
+ dma_sync_single_for_device(wo->hw->dev, entry->addr, skb->len,
+ DMA_TO_DEVICE);
+
+ ctrl = FIELD_PREP(MTK_WED_WO_CTL_SD_LEN0, skb->len) |
+ MTK_WED_WO_CTL_LAST_SEC0 | MTK_WED_WO_CTL_DMA_DONE;
+ WRITE_ONCE(desc->buf0, cpu_to_le32(entry->addr));
+ WRITE_ONCE(desc->ctrl, cpu_to_le32(ctrl));
+
+ mtk_wed_wo_queue_kick(wo, q, q->head);
+ mtk_wed_wo_kickout(wo);
+out:
+ spin_unlock_bh(&q->lock);
+
+ dev_kfree_skb(skb);
+
+ return ret;
+}
+
+static int
+mtk_wed_wo_exception_init(struct mtk_wed_wo *wo)
+{
+ return 0;
+}
+
+static int
+mtk_wed_wo_hardware_init(struct mtk_wed_wo *wo)
+{
+ struct mtk_wed_wo_queue_regs regs;
+ struct device_node *np;
+ int ret;
+
+ np = of_parse_phandle(wo->hw->node, "mediatek,wo-ccif", 0);
+ if (!np)
+ return -ENODEV;
+
+ wo->mmio.regs = syscon_regmap_lookup_by_phandle(np, NULL);
+ if (IS_ERR_OR_NULL(wo->mmio.regs))
+ return PTR_ERR(wo->mmio.regs);
+
+ wo->mmio.irq = irq_of_parse_and_map(np, 0);
+ wo->mmio.irq_mask = MTK_WED_WO_ALL_INT_MASK;
+ spin_lock_init(&wo->mmio.lock);
+ tasklet_setup(&wo->mmio.irq_tasklet, mtk_wed_wo_irq_tasklet);
+
+ ret = devm_request_irq(wo->hw->dev, wo->mmio.irq,
+ mtk_wed_wo_irq_handler, IRQF_TRIGGER_HIGH,
+ KBUILD_MODNAME, wo);
+ if (ret)
+ goto error;
+
+ regs.desc_base = MTK_WED_WO_CCIF_DUMMY1;
+ regs.ring_size = MTK_WED_WO_CCIF_DUMMY2;
+ regs.dma_idx = MTK_WED_WO_CCIF_SHADOW4;
+ regs.cpu_idx = MTK_WED_WO_CCIF_DUMMY3;
+
+ ret = mtk_wed_wo_queue_alloc(wo, &wo->q_tx, MTK_WED_WO_RING_SIZE,
+ MTK_WED_WO_CMD_LEN, MTK_WED_WO_TXCH_NUM,
+ &regs);
+ if (ret)
+ goto error;
+
+ mtk_wed_wo_queue_refill(wo, &wo->q_tx, GFP_KERNEL, false);
+ mtk_wed_wo_queue_reset(wo, &wo->q_tx);
+
+ regs.desc_base = MTK_WED_WO_CCIF_DUMMY5;
+ regs.ring_size = MTK_WED_WO_CCIF_DUMMY6;
+ regs.dma_idx = MTK_WED_WO_CCIF_SHADOW8;
+ regs.cpu_idx = MTK_WED_WO_CCIF_DUMMY7;
+
+ ret = mtk_wed_wo_queue_alloc(wo, &wo->q_rx, MTK_WED_WO_RING_SIZE,
+ MTK_WED_WO_CMD_LEN, MTK_WED_WO_RXCH_NUM,
+ &regs);
+ if (ret)
+ goto error;
+
+ mtk_wed_wo_queue_refill(wo, &wo->q_rx, GFP_KERNEL, true);
+ mtk_wed_wo_queue_reset(wo, &wo->q_rx);
+
+ /* rx queue irqmask */
+ mtk_wed_wo_set_isr(wo, wo->mmio.irq_mask);
+
+ return 0;
+
+error:
+ devm_free_irq(wo->hw->dev, wo->mmio.irq, wo);
+
+ return ret;
+}
+
+static void
+mtk_wed_wo_hw_deinit(struct mtk_wed_wo *wo)
+{
+ /* disable interrupts */
+ mtk_wed_wo_set_isr(wo, 0);
+
+ tasklet_disable(&wo->mmio.irq_tasklet);
+
+ disable_irq(wo->mmio.irq);
+ devm_free_irq(wo->hw->dev, wo->mmio.irq, wo);
+
+ mtk_wed_wo_queue_tx_clean(wo, &wo->q_tx);
+ mtk_wed_wo_queue_rx_clean(wo, &wo->q_rx);
+ mtk_wed_wo_queue_free(wo, &wo->q_tx);
+ mtk_wed_wo_queue_free(wo, &wo->q_rx);
+}
+
+int mtk_wed_wo_init(struct mtk_wed_hw *hw)
+{
+ struct mtk_wed_wo *wo;
+ int ret;
+
+ wo = devm_kzalloc(hw->dev, sizeof(*wo), GFP_KERNEL);
+ if (!wo)
+ return -ENOMEM;
+
+ hw->wed_wo = wo;
+ wo->hw = hw;
+
+ ret = mtk_wed_wo_hardware_init(wo);
+ if (ret)
+ return ret;
+
+ ret = mtk_wed_mcu_init(wo);
+ if (ret)
+ return ret;
+
+ return mtk_wed_wo_exception_init(wo);
+}
+
+void mtk_wed_wo_deinit(struct mtk_wed_hw *hw)
+{
+ struct mtk_wed_wo *wo = hw->wed_wo;
+
+ mtk_wed_wo_hw_deinit(wo);
+}
--- a/drivers/net/ethernet/mediatek/mtk_wed_wo.h
+++ b/drivers/net/ethernet/mediatek/mtk_wed_wo.h
@@ -80,6 +80,54 @@ enum mtk_wed_dummy_cr_idx {
#define MTK_WO_MCU_CFG_LS_WF_WM_WA_WM_CPU_RSTB_MASK BIT(5)
#define MTK_WO_MCU_CFG_LS_WF_WM_WA_WA_CPU_RSTB_MASK BIT(0)
+#define MTK_WED_WO_RING_SIZE 256
+#define MTK_WED_WO_CMD_LEN 1504
+
+#define MTK_WED_WO_TXCH_NUM 0
+#define MTK_WED_WO_RXCH_NUM 1
+#define MTK_WED_WO_RXCH_WO_EXCEPTION 7
+
+#define MTK_WED_WO_TXCH_INT_MASK BIT(0)
+#define MTK_WED_WO_RXCH_INT_MASK BIT(1)
+#define MTK_WED_WO_EXCEPTION_INT_MASK BIT(7)
+#define MTK_WED_WO_ALL_INT_MASK (MTK_WED_WO_RXCH_INT_MASK | \
+ MTK_WED_WO_EXCEPTION_INT_MASK)
+
+#define MTK_WED_WO_CCIF_BUSY 0x004
+#define MTK_WED_WO_CCIF_START 0x008
+#define MTK_WED_WO_CCIF_TCHNUM 0x00c
+#define MTK_WED_WO_CCIF_RCHNUM 0x010
+#define MTK_WED_WO_CCIF_RCHNUM_MASK GENMASK(7, 0)
+
+#define MTK_WED_WO_CCIF_ACK 0x014
+#define MTK_WED_WO_CCIF_IRQ0_MASK 0x018
+#define MTK_WED_WO_CCIF_IRQ1_MASK 0x01c
+#define MTK_WED_WO_CCIF_DUMMY1 0x020
+#define MTK_WED_WO_CCIF_DUMMY2 0x024
+#define MTK_WED_WO_CCIF_DUMMY3 0x028
+#define MTK_WED_WO_CCIF_DUMMY4 0x02c
+#define MTK_WED_WO_CCIF_SHADOW1 0x030
+#define MTK_WED_WO_CCIF_SHADOW2 0x034
+#define MTK_WED_WO_CCIF_SHADOW3 0x038
+#define MTK_WED_WO_CCIF_SHADOW4 0x03c
+#define MTK_WED_WO_CCIF_DUMMY5 0x050
+#define MTK_WED_WO_CCIF_DUMMY6 0x054
+#define MTK_WED_WO_CCIF_DUMMY7 0x058
+#define MTK_WED_WO_CCIF_DUMMY8 0x05c
+#define MTK_WED_WO_CCIF_SHADOW5 0x060
+#define MTK_WED_WO_CCIF_SHADOW6 0x064
+#define MTK_WED_WO_CCIF_SHADOW7 0x068
+#define MTK_WED_WO_CCIF_SHADOW8 0x06c
+
+#define MTK_WED_WO_CTL_SD_LEN1 GENMASK(13, 0)
+#define MTK_WED_WO_CTL_LAST_SEC1 BIT(14)
+#define MTK_WED_WO_CTL_BURST BIT(15)
+#define MTK_WED_WO_CTL_SD_LEN0_SHIFT 16
+#define MTK_WED_WO_CTL_SD_LEN0 GENMASK(29, 16)
+#define MTK_WED_WO_CTL_LAST_SEC0 BIT(30)
+#define MTK_WED_WO_CTL_DMA_DONE BIT(31)
+#define MTK_WED_WO_INFO_WINFO GENMASK(15, 0)
+
struct mtk_wed_wo_memory_region {
const char *name;
void __iomem *addr;
@@ -112,10 +160,53 @@ struct mtk_wed_fw_trailer {
u32 crc;
};
+struct mtk_wed_wo_queue_regs {
+ u32 desc_base;
+ u32 ring_size;
+ u32 cpu_idx;
+ u32 dma_idx;
+};
+
+struct mtk_wed_wo_queue_desc {
+ __le32 buf0;
+ __le32 ctrl;
+ __le32 buf1;
+ __le32 info;
+ __le32 reserved[4];
+} __packed __aligned(32);
+
+struct mtk_wed_wo_queue_entry {
+ dma_addr_t addr;
+ void *buf;
+ u32 len;
+};
+
+struct mtk_wed_wo_queue {
+ struct mtk_wed_wo_queue_regs regs;
+
+ struct page_frag_cache cache;
+ spinlock_t lock;
+
+ struct mtk_wed_wo_queue_desc *desc;
+ dma_addr_t desc_dma;
+
+ struct mtk_wed_wo_queue_entry *entry;
+
+ u16 head;
+ u16 tail;
+ int n_desc;
+ int queued;
+ int buf_size;
+
+};
+
struct mtk_wed_wo {
struct mtk_wed_hw *hw;
struct mtk_wed_wo_memory_region boot;
+ struct mtk_wed_wo_queue q_tx;
+ struct mtk_wed_wo_queue q_rx;
+
struct {
struct mutex mutex;
int timeout;
@@ -124,6 +215,15 @@ struct mtk_wed_wo {
struct sk_buff_head res_q;
wait_queue_head_t wait;
} mcu;
+
+ struct {
+ struct regmap *regs;
+
+ spinlock_t lock;
+ struct tasklet_struct irq_tasklet;
+ int irq;
+ u32 irq_mask;
+ } mmio;
};
static inline int
@@ -146,5 +246,9 @@ void mtk_wed_mcu_rx_unsolicited_event(st
int mtk_wed_mcu_send_msg(struct mtk_wed_wo *wo, int id, int cmd,
const void *data, int len, bool wait_resp);
int mtk_wed_mcu_init(struct mtk_wed_wo *wo);
+int mtk_wed_wo_init(struct mtk_wed_hw *hw);
+void mtk_wed_wo_deinit(struct mtk_wed_hw *hw);
+int mtk_wed_wo_queue_tx_skb(struct mtk_wed_wo *dev, struct mtk_wed_wo_queue *q,
+ struct sk_buff *skb);
#endif /* __MTK_WED_WO_H */

View File

@ -1,79 +0,0 @@
From: Lorenzo Bianconi <lorenzo@kernel.org>
Date: Sat, 5 Nov 2022 23:36:20 +0100
Subject: [PATCH] net: ethernet: mtk_wed: rename tx_wdma array in rx_wdma
Rename tx_wdma queue array in rx_wdma since this is rx side of wdma soc.
Moreover rename mtk_wed_wdma_ring_setup routine in
mtk_wed_wdma_rx_ring_setup()
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
--- a/drivers/net/ethernet/mediatek/mtk_wed.c
+++ b/drivers/net/ethernet/mediatek/mtk_wed.c
@@ -253,8 +253,8 @@ mtk_wed_free_tx_rings(struct mtk_wed_dev
for (i = 0; i < ARRAY_SIZE(dev->tx_ring); i++)
mtk_wed_free_ring(dev, &dev->tx_ring[i]);
- for (i = 0; i < ARRAY_SIZE(dev->tx_wdma); i++)
- mtk_wed_free_ring(dev, &dev->tx_wdma[i]);
+ for (i = 0; i < ARRAY_SIZE(dev->rx_wdma); i++)
+ mtk_wed_free_ring(dev, &dev->rx_wdma[i]);
}
static void
@@ -688,10 +688,10 @@ mtk_wed_ring_alloc(struct mtk_wed_device
}
static int
-mtk_wed_wdma_ring_setup(struct mtk_wed_device *dev, int idx, int size)
+mtk_wed_wdma_rx_ring_setup(struct mtk_wed_device *dev, int idx, int size)
{
u32 desc_size = sizeof(struct mtk_wdma_desc) * dev->hw->version;
- struct mtk_wed_ring *wdma = &dev->tx_wdma[idx];
+ struct mtk_wed_ring *wdma = &dev->rx_wdma[idx];
if (mtk_wed_ring_alloc(dev, wdma, MTK_WED_WDMA_RING_SIZE, desc_size))
return -ENOMEM;
@@ -805,9 +805,9 @@ mtk_wed_start(struct mtk_wed_device *dev
{
int i;
- for (i = 0; i < ARRAY_SIZE(dev->tx_wdma); i++)
- if (!dev->tx_wdma[i].desc)
- mtk_wed_wdma_ring_setup(dev, i, 16);
+ for (i = 0; i < ARRAY_SIZE(dev->rx_wdma); i++)
+ if (!dev->rx_wdma[i].desc)
+ mtk_wed_wdma_rx_ring_setup(dev, i, 16);
mtk_wed_hw_init(dev);
mtk_wed_configure_irq(dev, irq_mask);
@@ -916,7 +916,7 @@ mtk_wed_tx_ring_setup(struct mtk_wed_dev
sizeof(*ring->desc)))
return -ENOMEM;
- if (mtk_wed_wdma_ring_setup(dev, idx, MTK_WED_WDMA_RING_SIZE))
+ if (mtk_wed_wdma_rx_ring_setup(dev, idx, MTK_WED_WDMA_RING_SIZE))
return -ENOMEM;
ring->reg_base = MTK_WED_RING_TX(idx);
--- a/include/linux/soc/mediatek/mtk_wed.h
+++ b/include/linux/soc/mediatek/mtk_wed.h
@@ -7,6 +7,7 @@
#include <linux/pci.h>
#define MTK_WED_TX_QUEUES 2
+#define MTK_WED_RX_QUEUES 2
struct mtk_wed_hw;
struct mtk_wdma_desc;
@@ -66,7 +67,7 @@ struct mtk_wed_device {
struct mtk_wed_ring tx_ring[MTK_WED_TX_QUEUES];
struct mtk_wed_ring txfree_ring;
- struct mtk_wed_ring tx_wdma[MTK_WED_TX_QUEUES];
+ struct mtk_wed_ring rx_wdma[MTK_WED_RX_QUEUES];
struct {
int size;

View File

@ -1,149 +0,0 @@
From: Lorenzo Bianconi <lorenzo@kernel.org>
Date: Sat, 5 Nov 2022 23:36:22 +0100
Subject: [PATCH] net: ethernet: mtk_wed: add rx mib counters
Introduce WED RX MIB counters support available on MT7986a SoC.
Tested-by: Daniel Golle <daniel@makrotopia.org>
Co-developed-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
--- a/drivers/net/ethernet/mediatek/mtk_wed_debugfs.c
+++ b/drivers/net/ethernet/mediatek/mtk_wed_debugfs.c
@@ -2,6 +2,7 @@
/* Copyright (C) 2021 Felix Fietkau <nbd@nbd.name> */
#include <linux/seq_file.h>
+#include <linux/soc/mediatek/mtk_wed.h>
#include "mtk_wed.h"
#include "mtk_wed_regs.h"
@@ -18,6 +19,8 @@ enum {
DUMP_TYPE_WDMA,
DUMP_TYPE_WPDMA_TX,
DUMP_TYPE_WPDMA_TXFREE,
+ DUMP_TYPE_WPDMA_RX,
+ DUMP_TYPE_WED_RRO,
};
#define DUMP_STR(_str) { _str, 0, DUMP_TYPE_STRING }
@@ -36,6 +39,9 @@ enum {
#define DUMP_WPDMA_TX_RING(_n) DUMP_RING("WPDMA_TX" #_n, 0, DUMP_TYPE_WPDMA_TX, _n)
#define DUMP_WPDMA_TXFREE_RING DUMP_RING("WPDMA_RX1", 0, DUMP_TYPE_WPDMA_TXFREE)
+#define DUMP_WPDMA_RX_RING(_n) DUMP_RING("WPDMA_RX" #_n, 0, DUMP_TYPE_WPDMA_RX, _n)
+#define DUMP_WED_RRO_RING(_base)DUMP_RING("WED_RRO_MIOD", MTK_##_base, DUMP_TYPE_WED_RRO)
+#define DUMP_WED_RRO_FDBK(_base)DUMP_RING("WED_RRO_FDBK", MTK_##_base, DUMP_TYPE_WED_RRO)
static void
print_reg_val(struct seq_file *s, const char *name, u32 val)
@@ -57,6 +63,7 @@ dump_wed_regs(struct seq_file *s, struct
cur > regs ? "\n" : "",
cur->name);
continue;
+ case DUMP_TYPE_WED_RRO:
case DUMP_TYPE_WED:
val = wed_r32(dev, cur->offset);
break;
@@ -69,6 +76,9 @@ dump_wed_regs(struct seq_file *s, struct
case DUMP_TYPE_WPDMA_TXFREE:
val = wpdma_txfree_r32(dev, cur->offset);
break;
+ case DUMP_TYPE_WPDMA_RX:
+ val = wpdma_rx_r32(dev, cur->base, cur->offset);
+ break;
}
print_reg_val(s, cur->name, val);
}
@@ -132,6 +142,80 @@ wed_txinfo_show(struct seq_file *s, void
}
DEFINE_SHOW_ATTRIBUTE(wed_txinfo);
+static int
+wed_rxinfo_show(struct seq_file *s, void *data)
+{
+ static const struct reg_dump regs[] = {
+ DUMP_STR("WPDMA RX"),
+ DUMP_WPDMA_RX_RING(0),
+ DUMP_WPDMA_RX_RING(1),
+
+ DUMP_STR("WPDMA RX"),
+ DUMP_WED(WED_WPDMA_RX_D_MIB(0)),
+ DUMP_WED_RING(WED_WPDMA_RING_RX_DATA(0)),
+ DUMP_WED(WED_WPDMA_RX_D_PROCESSED_MIB(0)),
+ DUMP_WED(WED_WPDMA_RX_D_MIB(1)),
+ DUMP_WED_RING(WED_WPDMA_RING_RX_DATA(1)),
+ DUMP_WED(WED_WPDMA_RX_D_PROCESSED_MIB(1)),
+ DUMP_WED(WED_WPDMA_RX_D_COHERENT_MIB),
+
+ DUMP_STR("WED RX"),
+ DUMP_WED_RING(WED_RING_RX_DATA(0)),
+ DUMP_WED_RING(WED_RING_RX_DATA(1)),
+
+ DUMP_STR("WED RRO"),
+ DUMP_WED_RRO_RING(WED_RROQM_MIOD_CTRL0),
+ DUMP_WED(WED_RROQM_MID_MIB),
+ DUMP_WED(WED_RROQM_MOD_MIB),
+ DUMP_WED(WED_RROQM_MOD_COHERENT_MIB),
+ DUMP_WED_RRO_FDBK(WED_RROQM_FDBK_CTRL0),
+ DUMP_WED(WED_RROQM_FDBK_IND_MIB),
+ DUMP_WED(WED_RROQM_FDBK_ENQ_MIB),
+ DUMP_WED(WED_RROQM_FDBK_ANC_MIB),
+ DUMP_WED(WED_RROQM_FDBK_ANC2H_MIB),
+
+ DUMP_STR("WED Route QM"),
+ DUMP_WED(WED_RTQM_R2H_MIB(0)),
+ DUMP_WED(WED_RTQM_R2Q_MIB(0)),
+ DUMP_WED(WED_RTQM_Q2H_MIB(0)),
+ DUMP_WED(WED_RTQM_R2H_MIB(1)),
+ DUMP_WED(WED_RTQM_R2Q_MIB(1)),
+ DUMP_WED(WED_RTQM_Q2H_MIB(1)),
+ DUMP_WED(WED_RTQM_Q2N_MIB),
+ DUMP_WED(WED_RTQM_Q2B_MIB),
+ DUMP_WED(WED_RTQM_PFDBK_MIB),
+
+ DUMP_STR("WED WDMA TX"),
+ DUMP_WED(WED_WDMA_TX_MIB),
+ DUMP_WED_RING(WED_WDMA_RING_TX),
+
+ DUMP_STR("WDMA TX"),
+ DUMP_WDMA(WDMA_GLO_CFG),
+ DUMP_WDMA_RING(WDMA_RING_TX(0)),
+ DUMP_WDMA_RING(WDMA_RING_TX(1)),
+
+ DUMP_STR("WED RX BM"),
+ DUMP_WED(WED_RX_BM_BASE),
+ DUMP_WED(WED_RX_BM_RX_DMAD),
+ DUMP_WED(WED_RX_BM_PTR),
+ DUMP_WED(WED_RX_BM_TKID_MIB),
+ DUMP_WED(WED_RX_BM_BLEN),
+ DUMP_WED(WED_RX_BM_STS),
+ DUMP_WED(WED_RX_BM_INTF2),
+ DUMP_WED(WED_RX_BM_INTF),
+ DUMP_WED(WED_RX_BM_ERR_STS),
+ };
+ struct mtk_wed_hw *hw = s->private;
+ struct mtk_wed_device *dev = hw->wed_dev;
+
+ if (!dev)
+ return 0;
+
+ dump_wed_regs(s, dev, regs, ARRAY_SIZE(regs));
+
+ return 0;
+}
+DEFINE_SHOW_ATTRIBUTE(wed_rxinfo);
static int
mtk_wed_reg_set(void *data, u64 val)
@@ -175,4 +259,7 @@ void mtk_wed_hw_add_debugfs(struct mtk_w
debugfs_create_u32("regidx", 0600, dir, &hw->debugfs_reg);
debugfs_create_file_unsafe("regval", 0600, dir, hw, &fops_regval);
debugfs_create_file_unsafe("txinfo", 0400, dir, hw, &wed_txinfo_fops);
+ if (hw->version != 1)
+ debugfs_create_file_unsafe("rxinfo", 0400, dir, hw,
+ &wed_rxinfo_fops);
}

View File

@ -1,36 +0,0 @@
From: Lorenzo Bianconi <lorenzo@kernel.org>
Date: Thu, 17 Nov 2022 00:58:46 +0100
Subject: [PATCH] net: ethernet: mtk_eth_soc: remove cpu_relax in
mtk_pending_work
Get rid of cpu_relax in mtk_pending_work routine since MTK_RESETTING is
set only in mtk_pending_work() and it runs holding rtnl lock
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -3480,11 +3480,8 @@ static void mtk_pending_work(struct work
rtnl_lock();
dev_dbg(eth->dev, "[%s][%d] reset\n", __func__, __LINE__);
+ set_bit(MTK_RESETTING, &eth->state);
- while (test_and_set_bit_lock(MTK_RESETTING, &eth->state))
- cpu_relax();
-
- dev_dbg(eth->dev, "[%s][%d] mtk_stop starts\n", __func__, __LINE__);
/* stop all devices to make sure that dma is properly shut down */
for (i = 0; i < MTK_MAC_COUNT; i++) {
if (!eth->netdev[i])
@@ -3518,7 +3515,7 @@ static void mtk_pending_work(struct work
dev_dbg(eth->dev, "[%s][%d] reset done\n", __func__, __LINE__);
- clear_bit_unlock(MTK_RESETTING, &eth->state);
+ clear_bit(MTK_RESETTING, &eth->state);
rtnl_unlock();
}

View File

@ -1,80 +0,0 @@
From: Sujuan Chen <sujuan.chen@mediatek.com>
Date: Thu, 24 Nov 2022 11:18:14 +0800
Subject: [PATCH] net: ethernet: mtk_wed: add wcid overwritten support for wed
v1
All wed versions should enable the wcid overwritten feature,
since the wcid size is controlled by the wlan driver.
Tested-by: Sujuan Chen <sujuan.chen@mediatek.com>
Co-developed-by: Bo Jiao <bo.jiao@mediatek.com>
Signed-off-by: Bo Jiao <bo.jiao@mediatek.com>
Signed-off-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
--- a/drivers/net/ethernet/mediatek/mtk_wed.c
+++ b/drivers/net/ethernet/mediatek/mtk_wed.c
@@ -526,9 +526,9 @@ mtk_wed_dma_disable(struct mtk_wed_devic
MTK_WED_WPDMA_RX_D_RX_DRV_EN);
wed_clr(dev, MTK_WED_WDMA_GLO_CFG,
MTK_WED_WDMA_GLO_CFG_TX_DDONE_CHK);
-
- mtk_wed_set_512_support(dev, false);
}
+
+ mtk_wed_set_512_support(dev, false);
}
static void
@@ -1290,9 +1290,10 @@ mtk_wed_start(struct mtk_wed_device *dev
if (mtk_wed_rro_cfg(dev))
return;
- mtk_wed_set_512_support(dev, dev->wlan.wcid_512);
}
+ mtk_wed_set_512_support(dev, dev->wlan.wcid_512);
+
mtk_wed_dma_enable(dev);
dev->running = true;
}
@@ -1358,11 +1359,13 @@ mtk_wed_attach(struct mtk_wed_device *de
}
mtk_wed_hw_init_early(dev);
- if (hw->version == 1)
+ if (hw->version == 1) {
regmap_update_bits(hw->hifsys, HIFSYS_DMA_AG_MAP,
BIT(hw->index), 0);
- else
+ } else {
+ dev->rev_id = wed_r32(dev, MTK_WED_REV_ID);
ret = mtk_wed_wo_init(hw);
+ }
out:
if (ret)
mtk_wed_detach(dev);
--- a/drivers/net/ethernet/mediatek/mtk_wed_regs.h
+++ b/drivers/net/ethernet/mediatek/mtk_wed_regs.h
@@ -20,6 +20,8 @@ struct mtk_wdma_desc {
__le32 info;
} __packed __aligned(4);
+#define MTK_WED_REV_ID 0x004
+
#define MTK_WED_RESET 0x008
#define MTK_WED_RESET_TX_BM BIT(0)
#define MTK_WED_RESET_TX_FREE_AGENT BIT(4)
--- a/include/linux/soc/mediatek/mtk_wed.h
+++ b/include/linux/soc/mediatek/mtk_wed.h
@@ -85,6 +85,9 @@ struct mtk_wed_device {
int irq;
u8 version;
+ /* used by wlan driver */
+ u32 rev_id;
+
struct mtk_wed_ring tx_ring[MTK_WED_TX_QUEUES];
struct mtk_wed_ring rx_ring[MTK_WED_RX_QUEUES];
struct mtk_wed_ring txfree_ring;

View File

@ -1,85 +0,0 @@
From: Lorenzo Bianconi <lorenzo@kernel.org>
Date: Thu, 24 Nov 2022 16:22:51 +0100
Subject: [PATCH] net: ethernet: mtk_wed: return status value in
mtk_wdma_rx_reset
Move MTK_WDMA_RESET_IDX configuration in mtk_wdma_rx_reset routine.
Increase poll timeout to 10ms in order to be aligned with vendor sdk.
This is a preliminary patch to add Wireless Ethernet Dispatcher reset
support.
Co-developed-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
--- a/drivers/net/ethernet/mediatek/mtk_wed.c
+++ b/drivers/net/ethernet/mediatek/mtk_wed.c
@@ -101,17 +101,21 @@ mtk_wdma_read_reset(struct mtk_wed_devic
return wdma_r32(dev, MTK_WDMA_GLO_CFG);
}
-static void
+static int
mtk_wdma_rx_reset(struct mtk_wed_device *dev)
{
u32 status, mask = MTK_WDMA_GLO_CFG_RX_DMA_BUSY;
- int i;
+ int i, ret;
wdma_clr(dev, MTK_WDMA_GLO_CFG, MTK_WDMA_GLO_CFG_RX_DMA_EN);
- if (readx_poll_timeout(mtk_wdma_read_reset, dev, status,
- !(status & mask), 0, 1000))
+ ret = readx_poll_timeout(mtk_wdma_read_reset, dev, status,
+ !(status & mask), 0, 10000);
+ if (ret)
dev_err(dev->hw->dev, "rx reset failed\n");
+ wdma_w32(dev, MTK_WDMA_RESET_IDX, MTK_WDMA_RESET_IDX_RX);
+ wdma_w32(dev, MTK_WDMA_RESET_IDX, 0);
+
for (i = 0; i < ARRAY_SIZE(dev->rx_wdma); i++) {
if (dev->rx_wdma[i].desc)
continue;
@@ -119,6 +123,8 @@ mtk_wdma_rx_reset(struct mtk_wed_device
wdma_w32(dev,
MTK_WDMA_RING_RX(i) + MTK_WED_RING_OFS_CPU_IDX, 0);
}
+
+ return ret;
}
static void
@@ -565,9 +571,7 @@ mtk_wed_detach(struct mtk_wed_device *de
mtk_wed_stop(dev);
- wdma_w32(dev, MTK_WDMA_RESET_IDX, MTK_WDMA_RESET_IDX_RX);
- wdma_w32(dev, MTK_WDMA_RESET_IDX, 0);
-
+ mtk_wdma_rx_reset(dev);
mtk_wed_reset(dev, MTK_WED_RESET_WED);
if (mtk_wed_get_rx_capa(dev)) {
wdma_clr(dev, MTK_WDMA_GLO_CFG, MTK_WDMA_GLO_CFG_TX_DMA_EN);
@@ -582,7 +586,6 @@ mtk_wed_detach(struct mtk_wed_device *de
mtk_wed_wo_reset(dev);
mtk_wed_free_rx_rings(dev);
mtk_wed_wo_deinit(hw);
- mtk_wdma_rx_reset(dev);
}
if (dev->wlan.bus_type == MTK_WED_BUS_PCIE) {
@@ -999,11 +1002,7 @@ mtk_wed_reset_dma(struct mtk_wed_device
wed_w32(dev, MTK_WED_RESET_IDX, 0);
}
- wdma_w32(dev, MTK_WDMA_RESET_IDX, MTK_WDMA_RESET_IDX_RX);
- wdma_w32(dev, MTK_WDMA_RESET_IDX, 0);
-
- if (mtk_wed_get_rx_capa(dev))
- mtk_wdma_rx_reset(dev);
+ mtk_wdma_rx_reset(dev);
if (busy) {
mtk_wed_reset(dev, MTK_WED_RESET_WDMA_INT_AGENT);

View File

@ -1,52 +0,0 @@
From: Lorenzo Bianconi <lorenzo@kernel.org>
Date: Thu, 24 Nov 2022 16:22:52 +0100
Subject: [PATCH] net: ethernet: mtk_wed: move MTK_WDMA_RESET_IDX_TX
configuration in mtk_wdma_tx_reset
Remove duplicated code. Increase poll timeout to 10ms in order to be
aligned with vendor sdk.
This is a preliminary patch to add Wireless Ethernet Dispatcher reset
support.
Co-developed-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
--- a/drivers/net/ethernet/mediatek/mtk_wed.c
+++ b/drivers/net/ethernet/mediatek/mtk_wed.c
@@ -135,16 +135,15 @@ mtk_wdma_tx_reset(struct mtk_wed_device
wdma_clr(dev, MTK_WDMA_GLO_CFG, MTK_WDMA_GLO_CFG_TX_DMA_EN);
if (readx_poll_timeout(mtk_wdma_read_reset, dev, status,
- !(status & mask), 0, 1000))
+ !(status & mask), 0, 10000))
dev_err(dev->hw->dev, "tx reset failed\n");
- for (i = 0; i < ARRAY_SIZE(dev->tx_wdma); i++) {
- if (dev->tx_wdma[i].desc)
- continue;
+ wdma_w32(dev, MTK_WDMA_RESET_IDX, MTK_WDMA_RESET_IDX_TX);
+ wdma_w32(dev, MTK_WDMA_RESET_IDX, 0);
+ for (i = 0; i < ARRAY_SIZE(dev->tx_wdma); i++)
wdma_w32(dev,
MTK_WDMA_RING_TX(i) + MTK_WED_RING_OFS_CPU_IDX, 0);
- }
}
static void
@@ -573,12 +572,6 @@ mtk_wed_detach(struct mtk_wed_device *de
mtk_wdma_rx_reset(dev);
mtk_wed_reset(dev, MTK_WED_RESET_WED);
- if (mtk_wed_get_rx_capa(dev)) {
- wdma_clr(dev, MTK_WDMA_GLO_CFG, MTK_WDMA_GLO_CFG_TX_DMA_EN);
- wdma_w32(dev, MTK_WDMA_RESET_IDX, MTK_WDMA_RESET_IDX_TX);
- wdma_w32(dev, MTK_WDMA_RESET_IDX, 0);
- }
-
mtk_wed_free_tx_buffer(dev);
mtk_wed_free_tx_rings(dev);

View File

@ -1,98 +0,0 @@
From: Lorenzo Bianconi <lorenzo@kernel.org>
Date: Thu, 24 Nov 2022 16:22:53 +0100
Subject: [PATCH] net: ethernet: mtk_wed: update mtk_wed_stop
Update mtk_wed_stop routine and rename old mtk_wed_stop() to
mtk_wed_deinit(). This is a preliminary patch to add Wireless Ethernet
Dispatcher reset support.
Co-developed-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
--- a/drivers/net/ethernet/mediatek/mtk_wed.c
+++ b/drivers/net/ethernet/mediatek/mtk_wed.c
@@ -539,14 +539,8 @@ mtk_wed_dma_disable(struct mtk_wed_devic
static void
mtk_wed_stop(struct mtk_wed_device *dev)
{
- mtk_wed_dma_disable(dev);
mtk_wed_set_ext_int(dev, false);
- wed_clr(dev, MTK_WED_CTRL,
- MTK_WED_CTRL_WDMA_INT_AGENT_EN |
- MTK_WED_CTRL_WPDMA_INT_AGENT_EN |
- MTK_WED_CTRL_WED_TX_BM_EN |
- MTK_WED_CTRL_WED_TX_FREE_AGENT_EN);
wed_w32(dev, MTK_WED_WPDMA_INT_TRIGGER, 0);
wed_w32(dev, MTK_WED_WDMA_INT_TRIGGER, 0);
wdma_w32(dev, MTK_WDMA_INT_MASK, 0);
@@ -558,7 +552,27 @@ mtk_wed_stop(struct mtk_wed_device *dev)
wed_w32(dev, MTK_WED_EXT_INT_MASK1, 0);
wed_w32(dev, MTK_WED_EXT_INT_MASK2, 0);
- wed_clr(dev, MTK_WED_CTRL, MTK_WED_CTRL_WED_RX_BM_EN);
+}
+
+static void
+mtk_wed_deinit(struct mtk_wed_device *dev)
+{
+ mtk_wed_stop(dev);
+ mtk_wed_dma_disable(dev);
+
+ wed_clr(dev, MTK_WED_CTRL,
+ MTK_WED_CTRL_WDMA_INT_AGENT_EN |
+ MTK_WED_CTRL_WPDMA_INT_AGENT_EN |
+ MTK_WED_CTRL_WED_TX_BM_EN |
+ MTK_WED_CTRL_WED_TX_FREE_AGENT_EN);
+
+ if (dev->hw->version == 1)
+ return;
+
+ wed_clr(dev, MTK_WED_CTRL,
+ MTK_WED_CTRL_RX_ROUTE_QM_EN |
+ MTK_WED_CTRL_WED_RX_BM_EN |
+ MTK_WED_CTRL_RX_RRO_QM_EN);
}
static void
@@ -568,7 +582,7 @@ mtk_wed_detach(struct mtk_wed_device *de
mutex_lock(&hw_lock);
- mtk_wed_stop(dev);
+ mtk_wed_deinit(dev);
mtk_wdma_rx_reset(dev);
mtk_wed_reset(dev, MTK_WED_RESET_WED);
@@ -670,7 +684,7 @@ mtk_wed_hw_init_early(struct mtk_wed_dev
{
u32 mask, set;
- mtk_wed_stop(dev);
+ mtk_wed_deinit(dev);
mtk_wed_reset(dev, MTK_WED_RESET_WED);
mtk_wed_set_wpdma(dev);
--- a/include/linux/soc/mediatek/mtk_wed.h
+++ b/include/linux/soc/mediatek/mtk_wed.h
@@ -234,6 +234,8 @@ mtk_wed_get_rx_capa(struct mtk_wed_devic
(_dev)->ops->ppe_check(_dev, _skb, _reason, _hash)
#define mtk_wed_device_update_msg(_dev, _id, _msg, _len) \
(_dev)->ops->msg_update(_dev, _id, _msg, _len)
+#define mtk_wed_device_stop(_dev) (_dev)->ops->stop(_dev)
+#define mtk_wed_device_dma_reset(_dev) (_dev)->ops->reset_dma(_dev)
#else
static inline bool mtk_wed_device_active(struct mtk_wed_device *dev)
{
@@ -250,6 +252,8 @@ static inline bool mtk_wed_device_active
#define mtk_wed_device_rx_ring_setup(_dev, _ring, _regs) -ENODEV
#define mtk_wed_device_ppe_check(_dev, _skb, _reason, _hash) do {} while (0)
#define mtk_wed_device_update_msg(_dev, _id, _msg, _len) -ENODEV
+#define mtk_wed_device_stop(_dev) do {} while (0)
+#define mtk_wed_device_dma_reset(_dev) do {} while (0)
#endif
#endif

View File

@ -1,309 +0,0 @@
From: Lorenzo Bianconi <lorenzo@kernel.org>
Date: Thu, 24 Nov 2022 16:22:54 +0100
Subject: [PATCH] net: ethernet: mtk_wed: add mtk_wed_rx_reset routine
Introduce mtk_wed_rx_reset routine in order to reset rx DMA for Wireless
Ethernet Dispatcher available on MT7986 SoC.
Co-developed-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
--- a/drivers/net/ethernet/mediatek/mtk_wed.c
+++ b/drivers/net/ethernet/mediatek/mtk_wed.c
@@ -944,42 +944,130 @@ mtk_wed_ring_reset(struct mtk_wed_ring *
}
static u32
-mtk_wed_check_busy(struct mtk_wed_device *dev)
+mtk_wed_check_busy(struct mtk_wed_device *dev, u32 reg, u32 mask)
{
- if (wed_r32(dev, MTK_WED_GLO_CFG) & MTK_WED_GLO_CFG_TX_DMA_BUSY)
- return true;
-
- if (wed_r32(dev, MTK_WED_WPDMA_GLO_CFG) &
- MTK_WED_WPDMA_GLO_CFG_TX_DRV_BUSY)
- return true;
-
- if (wed_r32(dev, MTK_WED_CTRL) & MTK_WED_CTRL_WDMA_INT_AGENT_BUSY)
- return true;
-
- if (wed_r32(dev, MTK_WED_WDMA_GLO_CFG) &
- MTK_WED_WDMA_GLO_CFG_RX_DRV_BUSY)
- return true;
-
- if (wdma_r32(dev, MTK_WDMA_GLO_CFG) &
- MTK_WED_WDMA_GLO_CFG_RX_DRV_BUSY)
- return true;
-
- if (wed_r32(dev, MTK_WED_CTRL) &
- (MTK_WED_CTRL_WED_TX_BM_BUSY | MTK_WED_CTRL_WED_TX_FREE_AGENT_BUSY))
- return true;
-
- return false;
+ return !!(wed_r32(dev, reg) & mask);
}
static int
-mtk_wed_poll_busy(struct mtk_wed_device *dev)
+mtk_wed_poll_busy(struct mtk_wed_device *dev, u32 reg, u32 mask)
{
int sleep = 15000;
int timeout = 100 * sleep;
u32 val;
return read_poll_timeout(mtk_wed_check_busy, val, !val, sleep,
- timeout, false, dev);
+ timeout, false, dev, reg, mask);
+}
+
+static int
+mtk_wed_rx_reset(struct mtk_wed_device *dev)
+{
+ struct mtk_wed_wo *wo = dev->hw->wed_wo;
+ u8 val = MTK_WED_WO_STATE_SER_RESET;
+ int i, ret;
+
+ ret = mtk_wed_mcu_send_msg(wo, MTK_WED_MODULE_ID_WO,
+ MTK_WED_WO_CMD_CHANGE_STATE, &val,
+ sizeof(val), true);
+ if (ret)
+ return ret;
+
+ wed_clr(dev, MTK_WED_WPDMA_RX_D_GLO_CFG, MTK_WED_WPDMA_RX_D_RX_DRV_EN);
+ ret = mtk_wed_poll_busy(dev, MTK_WED_WPDMA_RX_D_GLO_CFG,
+ MTK_WED_WPDMA_RX_D_RX_DRV_BUSY);
+ if (ret) {
+ mtk_wed_reset(dev, MTK_WED_RESET_WPDMA_INT_AGENT);
+ mtk_wed_reset(dev, MTK_WED_RESET_WPDMA_RX_D_DRV);
+ } else {
+ wed_w32(dev, MTK_WED_WPDMA_RX_D_RST_IDX,
+ MTK_WED_WPDMA_RX_D_RST_CRX_IDX |
+ MTK_WED_WPDMA_RX_D_RST_DRV_IDX);
+
+ wed_set(dev, MTK_WED_WPDMA_RX_D_GLO_CFG,
+ MTK_WED_WPDMA_RX_D_RST_INIT_COMPLETE |
+ MTK_WED_WPDMA_RX_D_FSM_RETURN_IDLE);
+ wed_clr(dev, MTK_WED_WPDMA_RX_D_GLO_CFG,
+ MTK_WED_WPDMA_RX_D_RST_INIT_COMPLETE |
+ MTK_WED_WPDMA_RX_D_FSM_RETURN_IDLE);
+
+ wed_w32(dev, MTK_WED_WPDMA_RX_D_RST_IDX, 0);
+ }
+
+ /* reset rro qm */
+ wed_clr(dev, MTK_WED_CTRL, MTK_WED_CTRL_RX_RRO_QM_EN);
+ ret = mtk_wed_poll_busy(dev, MTK_WED_CTRL,
+ MTK_WED_CTRL_RX_RRO_QM_BUSY);
+ if (ret) {
+ mtk_wed_reset(dev, MTK_WED_RESET_RX_RRO_QM);
+ } else {
+ wed_set(dev, MTK_WED_RROQM_RST_IDX,
+ MTK_WED_RROQM_RST_IDX_MIOD |
+ MTK_WED_RROQM_RST_IDX_FDBK);
+ wed_w32(dev, MTK_WED_RROQM_RST_IDX, 0);
+ }
+
+ /* reset route qm */
+ wed_clr(dev, MTK_WED_CTRL, MTK_WED_CTRL_RX_ROUTE_QM_EN);
+ ret = mtk_wed_poll_busy(dev, MTK_WED_CTRL,
+ MTK_WED_CTRL_RX_ROUTE_QM_BUSY);
+ if (ret)
+ mtk_wed_reset(dev, MTK_WED_RESET_RX_ROUTE_QM);
+ else
+ wed_set(dev, MTK_WED_RTQM_GLO_CFG,
+ MTK_WED_RTQM_Q_RST);
+
+ /* reset tx wdma */
+ mtk_wdma_tx_reset(dev);
+
+ /* reset tx wdma drv */
+ wed_clr(dev, MTK_WED_WDMA_GLO_CFG, MTK_WED_WDMA_GLO_CFG_TX_DRV_EN);
+ mtk_wed_poll_busy(dev, MTK_WED_CTRL,
+ MTK_WED_CTRL_WDMA_INT_AGENT_BUSY);
+ mtk_wed_reset(dev, MTK_WED_RESET_WDMA_TX_DRV);
+
+ /* reset wed rx dma */
+ ret = mtk_wed_poll_busy(dev, MTK_WED_GLO_CFG,
+ MTK_WED_GLO_CFG_RX_DMA_BUSY);
+ wed_clr(dev, MTK_WED_GLO_CFG, MTK_WED_GLO_CFG_RX_DMA_EN);
+ if (ret) {
+ mtk_wed_reset(dev, MTK_WED_RESET_WED_RX_DMA);
+ } else {
+ struct mtk_eth *eth = dev->hw->eth;
+
+ if (MTK_HAS_CAPS(eth->soc->caps, MTK_NETSYS_V2))
+ wed_set(dev, MTK_WED_RESET_IDX,
+ MTK_WED_RESET_IDX_RX_V2);
+ else
+ wed_set(dev, MTK_WED_RESET_IDX, MTK_WED_RESET_IDX_RX);
+ wed_w32(dev, MTK_WED_RESET_IDX, 0);
+ }
+
+ /* reset rx bm */
+ wed_clr(dev, MTK_WED_CTRL, MTK_WED_CTRL_WED_RX_BM_EN);
+ mtk_wed_poll_busy(dev, MTK_WED_CTRL,
+ MTK_WED_CTRL_WED_RX_BM_BUSY);
+ mtk_wed_reset(dev, MTK_WED_RESET_RX_BM);
+
+ /* wo change to enable state */
+ val = MTK_WED_WO_STATE_ENABLE;
+ ret = mtk_wed_mcu_send_msg(wo, MTK_WED_MODULE_ID_WO,
+ MTK_WED_WO_CMD_CHANGE_STATE, &val,
+ sizeof(val), true);
+ if (ret)
+ return ret;
+
+ /* wed_rx_ring_reset */
+ for (i = 0; i < ARRAY_SIZE(dev->rx_ring); i++) {
+ if (!dev->rx_ring[i].desc)
+ continue;
+
+ mtk_wed_ring_reset(&dev->rx_ring[i], MTK_WED_RX_RING_SIZE,
+ false);
+ }
+ mtk_wed_free_rx_buffer(dev);
+
+ return 0;
}
static void
@@ -997,19 +1085,23 @@ mtk_wed_reset_dma(struct mtk_wed_device
true);
}
- if (mtk_wed_poll_busy(dev))
- busy = mtk_wed_check_busy(dev);
-
+ /* 1. reset WED tx DMA */
+ wed_clr(dev, MTK_WED_GLO_CFG, MTK_WED_GLO_CFG_TX_DMA_EN);
+ busy = mtk_wed_poll_busy(dev, MTK_WED_GLO_CFG,
+ MTK_WED_GLO_CFG_TX_DMA_BUSY);
if (busy) {
mtk_wed_reset(dev, MTK_WED_RESET_WED_TX_DMA);
} else {
- wed_w32(dev, MTK_WED_RESET_IDX,
- MTK_WED_RESET_IDX_TX |
- MTK_WED_RESET_IDX_RX);
+ wed_w32(dev, MTK_WED_RESET_IDX, MTK_WED_RESET_IDX_TX);
wed_w32(dev, MTK_WED_RESET_IDX, 0);
}
- mtk_wdma_rx_reset(dev);
+ /* 2. reset WDMA rx DMA */
+ busy = !!mtk_wdma_rx_reset(dev);
+ wed_clr(dev, MTK_WED_WDMA_GLO_CFG, MTK_WED_WDMA_GLO_CFG_RX_DRV_EN);
+ if (!busy)
+ busy = mtk_wed_poll_busy(dev, MTK_WED_WDMA_GLO_CFG,
+ MTK_WED_WDMA_GLO_CFG_RX_DRV_BUSY);
if (busy) {
mtk_wed_reset(dev, MTK_WED_RESET_WDMA_INT_AGENT);
@@ -1026,6 +1118,9 @@ mtk_wed_reset_dma(struct mtk_wed_device
MTK_WED_WDMA_GLO_CFG_RST_INIT_COMPLETE);
}
+ /* 3. reset WED WPDMA tx */
+ wed_clr(dev, MTK_WED_CTRL, MTK_WED_CTRL_WED_TX_FREE_AGENT_EN);
+
for (i = 0; i < 100; i++) {
val = wed_r32(dev, MTK_WED_TX_BM_INTF);
if (FIELD_GET(MTK_WED_TX_BM_INTF_TKFIFO_FDEP, val) == 0x40)
@@ -1033,8 +1128,19 @@ mtk_wed_reset_dma(struct mtk_wed_device
}
mtk_wed_reset(dev, MTK_WED_RESET_TX_FREE_AGENT);
+ wed_clr(dev, MTK_WED_CTRL, MTK_WED_CTRL_WED_TX_BM_EN);
mtk_wed_reset(dev, MTK_WED_RESET_TX_BM);
+ /* 4. reset WED WPDMA tx */
+ busy = mtk_wed_poll_busy(dev, MTK_WED_WPDMA_GLO_CFG,
+ MTK_WED_WPDMA_GLO_CFG_TX_DRV_BUSY);
+ wed_clr(dev, MTK_WED_WPDMA_GLO_CFG,
+ MTK_WED_WPDMA_GLO_CFG_TX_DRV_EN |
+ MTK_WED_WPDMA_GLO_CFG_RX_DRV_EN);
+ if (!busy)
+ busy = mtk_wed_poll_busy(dev, MTK_WED_WPDMA_GLO_CFG,
+ MTK_WED_WPDMA_GLO_CFG_RX_DRV_BUSY);
+
if (busy) {
mtk_wed_reset(dev, MTK_WED_RESET_WPDMA_INT_AGENT);
mtk_wed_reset(dev, MTK_WED_RESET_WPDMA_TX_DRV);
@@ -1045,6 +1151,17 @@ mtk_wed_reset_dma(struct mtk_wed_device
MTK_WED_WPDMA_RESET_IDX_RX);
wed_w32(dev, MTK_WED_WPDMA_RESET_IDX, 0);
}
+
+ dev->init_done = false;
+ if (dev->hw->version == 1)
+ return;
+
+ if (!busy) {
+ wed_w32(dev, MTK_WED_RESET_IDX, MTK_WED_RESET_WPDMA_IDX_RX);
+ wed_w32(dev, MTK_WED_RESET_IDX, 0);
+ }
+
+ mtk_wed_rx_reset(dev);
}
static int
@@ -1267,6 +1384,9 @@ mtk_wed_start(struct mtk_wed_device *dev
{
int i;
+ if (mtk_wed_get_rx_capa(dev) && mtk_wed_rx_buffer_alloc(dev))
+ return;
+
for (i = 0; i < ARRAY_SIZE(dev->rx_wdma); i++)
if (!dev->rx_wdma[i].desc)
mtk_wed_wdma_rx_ring_setup(dev, i, 16);
@@ -1355,10 +1475,6 @@ mtk_wed_attach(struct mtk_wed_device *de
goto out;
if (mtk_wed_get_rx_capa(dev)) {
- ret = mtk_wed_rx_buffer_alloc(dev);
- if (ret)
- goto out;
-
ret = mtk_wed_rro_alloc(dev);
if (ret)
goto out;
--- a/drivers/net/ethernet/mediatek/mtk_wed_regs.h
+++ b/drivers/net/ethernet/mediatek/mtk_wed_regs.h
@@ -24,11 +24,15 @@ struct mtk_wdma_desc {
#define MTK_WED_RESET 0x008
#define MTK_WED_RESET_TX_BM BIT(0)
+#define MTK_WED_RESET_RX_BM BIT(1)
#define MTK_WED_RESET_TX_FREE_AGENT BIT(4)
#define MTK_WED_RESET_WPDMA_TX_DRV BIT(8)
#define MTK_WED_RESET_WPDMA_RX_DRV BIT(9)
+#define MTK_WED_RESET_WPDMA_RX_D_DRV BIT(10)
#define MTK_WED_RESET_WPDMA_INT_AGENT BIT(11)
#define MTK_WED_RESET_WED_TX_DMA BIT(12)
+#define MTK_WED_RESET_WED_RX_DMA BIT(13)
+#define MTK_WED_RESET_WDMA_TX_DRV BIT(16)
#define MTK_WED_RESET_WDMA_RX_DRV BIT(17)
#define MTK_WED_RESET_WDMA_INT_AGENT BIT(19)
#define MTK_WED_RESET_RX_RRO_QM BIT(20)
@@ -158,6 +162,8 @@ struct mtk_wdma_desc {
#define MTK_WED_RESET_IDX 0x20c
#define MTK_WED_RESET_IDX_TX GENMASK(3, 0)
#define MTK_WED_RESET_IDX_RX GENMASK(17, 16)
+#define MTK_WED_RESET_IDX_RX_V2 GENMASK(7, 6)
+#define MTK_WED_RESET_WPDMA_IDX_RX GENMASK(31, 30)
#define MTK_WED_TX_MIB(_n) (0x2a0 + (_n) * 4)
#define MTK_WED_RX_MIB(_n) (0x2e0 + (_n) * 4)
@@ -267,6 +273,9 @@ struct mtk_wdma_desc {
#define MTK_WED_WPDMA_RX_D_GLO_CFG 0x75c
#define MTK_WED_WPDMA_RX_D_RX_DRV_EN BIT(0)
+#define MTK_WED_WPDMA_RX_D_RX_DRV_BUSY BIT(1)
+#define MTK_WED_WPDMA_RX_D_FSM_RETURN_IDLE BIT(3)
+#define MTK_WED_WPDMA_RX_D_RST_INIT_COMPLETE BIT(4)
#define MTK_WED_WPDMA_RX_D_INIT_PHASE_RXEN_SEL GENMASK(11, 7)
#define MTK_WED_WPDMA_RX_D_RXD_READ_LEN GENMASK(31, 24)

View File

@ -1,103 +0,0 @@
From: Lorenzo Bianconi <lorenzo@kernel.org>
Date: Thu, 24 Nov 2022 16:22:55 +0100
Subject: [PATCH] net: ethernet: mtk_wed: add reset to tx_ring_setup callback
Introduce reset parameter to mtk_wed_tx_ring_setup signature.
This is a preliminary patch to add Wireless Ethernet Dispatcher reset
support.
Co-developed-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
--- a/drivers/net/ethernet/mediatek/mtk_wed.c
+++ b/drivers/net/ethernet/mediatek/mtk_wed.c
@@ -1181,7 +1181,8 @@ mtk_wed_ring_alloc(struct mtk_wed_device
}
static int
-mtk_wed_wdma_rx_ring_setup(struct mtk_wed_device *dev, int idx, int size)
+mtk_wed_wdma_rx_ring_setup(struct mtk_wed_device *dev, int idx, int size,
+ bool reset)
{
u32 desc_size = sizeof(struct mtk_wdma_desc) * dev->hw->version;
struct mtk_wed_ring *wdma;
@@ -1190,8 +1191,8 @@ mtk_wed_wdma_rx_ring_setup(struct mtk_we
return -EINVAL;
wdma = &dev->rx_wdma[idx];
- if (mtk_wed_ring_alloc(dev, wdma, MTK_WED_WDMA_RING_SIZE, desc_size,
- true))
+ if (!reset && mtk_wed_ring_alloc(dev, wdma, MTK_WED_WDMA_RING_SIZE,
+ desc_size, true))
return -ENOMEM;
wdma_w32(dev, MTK_WDMA_RING_RX(idx) + MTK_WED_RING_OFS_BASE,
@@ -1389,7 +1390,7 @@ mtk_wed_start(struct mtk_wed_device *dev
for (i = 0; i < ARRAY_SIZE(dev->rx_wdma); i++)
if (!dev->rx_wdma[i].desc)
- mtk_wed_wdma_rx_ring_setup(dev, i, 16);
+ mtk_wed_wdma_rx_ring_setup(dev, i, 16, false);
mtk_wed_hw_init(dev);
mtk_wed_configure_irq(dev, irq_mask);
@@ -1498,7 +1499,8 @@ unlock:
}
static int
-mtk_wed_tx_ring_setup(struct mtk_wed_device *dev, int idx, void __iomem *regs)
+mtk_wed_tx_ring_setup(struct mtk_wed_device *dev, int idx, void __iomem *regs,
+ bool reset)
{
struct mtk_wed_ring *ring = &dev->tx_ring[idx];
@@ -1517,11 +1519,12 @@ mtk_wed_tx_ring_setup(struct mtk_wed_dev
if (WARN_ON(idx >= ARRAY_SIZE(dev->tx_ring)))
return -EINVAL;
- if (mtk_wed_ring_alloc(dev, ring, MTK_WED_TX_RING_SIZE,
- sizeof(*ring->desc), true))
+ if (!reset && mtk_wed_ring_alloc(dev, ring, MTK_WED_TX_RING_SIZE,
+ sizeof(*ring->desc), true))
return -ENOMEM;
- if (mtk_wed_wdma_rx_ring_setup(dev, idx, MTK_WED_WDMA_RING_SIZE))
+ if (mtk_wed_wdma_rx_ring_setup(dev, idx, MTK_WED_WDMA_RING_SIZE,
+ reset))
return -ENOMEM;
ring->reg_base = MTK_WED_RING_TX(idx);
--- a/include/linux/soc/mediatek/mtk_wed.h
+++ b/include/linux/soc/mediatek/mtk_wed.h
@@ -158,7 +158,7 @@ struct mtk_wed_device {
struct mtk_wed_ops {
int (*attach)(struct mtk_wed_device *dev);
int (*tx_ring_setup)(struct mtk_wed_device *dev, int ring,
- void __iomem *regs);
+ void __iomem *regs, bool reset);
int (*rx_ring_setup)(struct mtk_wed_device *dev, int ring,
void __iomem *regs);
int (*txfree_ring_setup)(struct mtk_wed_device *dev,
@@ -216,8 +216,8 @@ mtk_wed_get_rx_capa(struct mtk_wed_devic
#define mtk_wed_device_active(_dev) !!(_dev)->ops
#define mtk_wed_device_detach(_dev) (_dev)->ops->detach(_dev)
#define mtk_wed_device_start(_dev, _mask) (_dev)->ops->start(_dev, _mask)
-#define mtk_wed_device_tx_ring_setup(_dev, _ring, _regs) \
- (_dev)->ops->tx_ring_setup(_dev, _ring, _regs)
+#define mtk_wed_device_tx_ring_setup(_dev, _ring, _regs, _reset) \
+ (_dev)->ops->tx_ring_setup(_dev, _ring, _regs, _reset)
#define mtk_wed_device_txfree_ring_setup(_dev, _regs) \
(_dev)->ops->txfree_ring_setup(_dev, _regs)
#define mtk_wed_device_reg_read(_dev, _reg) \
@@ -243,7 +243,7 @@ static inline bool mtk_wed_device_active
}
#define mtk_wed_device_detach(_dev) do {} while (0)
#define mtk_wed_device_start(_dev, _mask) do {} while (0)
-#define mtk_wed_device_tx_ring_setup(_dev, _ring, _regs) -ENODEV
+#define mtk_wed_device_tx_ring_setup(_dev, _ring, _regs, _reset) -ENODEV
#define mtk_wed_device_txfree_ring_setup(_dev, _ring, _regs) -ENODEV
#define mtk_wed_device_reg_read(_dev, _reg) 0
#define mtk_wed_device_reg_write(_dev, _reg, _val) do {} while (0)

View File

@ -1,103 +0,0 @@
From: Lorenzo Bianconi <lorenzo@kernel.org>
Date: Thu, 1 Dec 2022 16:26:53 +0100
Subject: [PATCH] net: ethernet: mtk_wed: fix sleep while atomic in
mtk_wed_wo_queue_refill
In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.
[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c
Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
--- a/drivers/net/ethernet/mediatek/mtk_wed_wo.c
+++ b/drivers/net/ethernet/mediatek/mtk_wed_wo.c
@@ -133,17 +133,18 @@ mtk_wed_wo_dequeue(struct mtk_wed_wo *wo
static int
mtk_wed_wo_queue_refill(struct mtk_wed_wo *wo, struct mtk_wed_wo_queue *q,
- gfp_t gfp, bool rx)
+ bool rx)
{
enum dma_data_direction dir = rx ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
int n_buf = 0;
spin_lock_bh(&q->lock);
while (q->queued < q->n_desc) {
- void *buf = page_frag_alloc(&q->cache, q->buf_size, gfp);
struct mtk_wed_wo_queue_entry *entry;
dma_addr_t addr;
+ void *buf;
+ buf = page_frag_alloc(&q->cache, q->buf_size, GFP_ATOMIC);
if (!buf)
break;
@@ -215,7 +216,7 @@ mtk_wed_wo_rx_run_queue(struct mtk_wed_w
mtk_wed_mcu_rx_unsolicited_event(wo, skb);
}
- if (mtk_wed_wo_queue_refill(wo, q, GFP_ATOMIC, true)) {
+ if (mtk_wed_wo_queue_refill(wo, q, true)) {
u32 index = (q->head - 1) % q->n_desc;
mtk_wed_wo_queue_kick(wo, q, index);
@@ -432,7 +433,7 @@ mtk_wed_wo_hardware_init(struct mtk_wed_
if (ret)
goto error;
- mtk_wed_wo_queue_refill(wo, &wo->q_tx, GFP_KERNEL, false);
+ mtk_wed_wo_queue_refill(wo, &wo->q_tx, false);
mtk_wed_wo_queue_reset(wo, &wo->q_tx);
regs.desc_base = MTK_WED_WO_CCIF_DUMMY5;
@@ -446,7 +447,7 @@ mtk_wed_wo_hardware_init(struct mtk_wed_
if (ret)
goto error;
- mtk_wed_wo_queue_refill(wo, &wo->q_rx, GFP_KERNEL, true);
+ mtk_wed_wo_queue_refill(wo, &wo->q_rx, true);
mtk_wed_wo_queue_reset(wo, &wo->q_rx);
/* rx queue irqmask */

View File

@ -1,52 +0,0 @@
From: Lorenzo Bianconi <lorenzo@kernel.org>
Date: Tue, 10 Jan 2023 10:31:26 +0100
Subject: [PATCH] net: ethernet: mtk_wed: get rid of queue lock for rx queue
Queue spinlock is currently held in mtk_wed_wo_queue_rx_clean and
mtk_wed_wo_queue_refill routines for MTK Wireless Ethernet Dispatcher
MCU rx queue. mtk_wed_wo_queue_refill() is running during initialization
and in rx tasklet while mtk_wed_wo_queue_rx_clean() is running in
mtk_wed_wo_hw_deinit() during hw de-init phase after rx tasklet has been
disabled. Since mtk_wed_wo_queue_rx_clean and mtk_wed_wo_queue_refill
routines can't run concurrently get rid of spinlock for mcu rx queue.
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/36ec3b729542ea60898471d890796f745479ba32.1673342990.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
--- a/drivers/net/ethernet/mediatek/mtk_wed_wo.c
+++ b/drivers/net/ethernet/mediatek/mtk_wed_wo.c
@@ -138,7 +138,6 @@ mtk_wed_wo_queue_refill(struct mtk_wed_w
enum dma_data_direction dir = rx ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
int n_buf = 0;
- spin_lock_bh(&q->lock);
while (q->queued < q->n_desc) {
struct mtk_wed_wo_queue_entry *entry;
dma_addr_t addr;
@@ -172,7 +171,6 @@ mtk_wed_wo_queue_refill(struct mtk_wed_w
q->queued++;
n_buf++;
}
- spin_unlock_bh(&q->lock);
return n_buf;
}
@@ -316,7 +314,6 @@ mtk_wed_wo_queue_rx_clean(struct mtk_wed
{
struct page *page;
- spin_lock_bh(&q->lock);
for (;;) {
void *buf = mtk_wed_wo_dequeue(wo, q, NULL, true);
@@ -325,7 +322,6 @@ mtk_wed_wo_queue_rx_clean(struct mtk_wed
skb_free_frag(buf);
}
- spin_unlock_bh(&q->lock);
if (!q->cache.va)
return;

View File

@ -1,75 +0,0 @@
From: Lorenzo Bianconi <lorenzo@kernel.org>
Date: Thu, 12 Jan 2023 10:21:29 +0100
Subject: [PATCH] net: ethernet: mtk_wed: get rid of queue lock for tx queue
Similar to MTK Wireless Ethernet Dispatcher (WED) MCU rx queue,
we do not need to protect WED MCU tx queue with a spin lock since
the tx queue is accessed in the two following routines:
- mtk_wed_wo_queue_tx_skb():
it is run at initialization and during mt7915 normal operation.
Moreover MCU messages are serialized through MCU mutex.
- mtk_wed_wo_queue_tx_clean():
it runs just at mt7915 driver module unload when no more messages
are sent to the MCU.
Remove tx queue spinlock.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/7bd0337b2a13ab1a63673b7c03fd35206b3b284e.1673515140.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
--- a/drivers/net/ethernet/mediatek/mtk_wed_wo.c
+++ b/drivers/net/ethernet/mediatek/mtk_wed_wo.c
@@ -258,7 +258,6 @@ mtk_wed_wo_queue_alloc(struct mtk_wed_wo
int n_desc, int buf_size, int index,
struct mtk_wed_wo_queue_regs *regs)
{
- spin_lock_init(&q->lock);
q->regs = *regs;
q->n_desc = n_desc;
q->buf_size = buf_size;
@@ -290,7 +289,6 @@ mtk_wed_wo_queue_tx_clean(struct mtk_wed
struct page *page;
int i;
- spin_lock_bh(&q->lock);
for (i = 0; i < q->n_desc; i++) {
struct mtk_wed_wo_queue_entry *entry = &q->entry[i];
@@ -299,7 +297,6 @@ mtk_wed_wo_queue_tx_clean(struct mtk_wed
skb_free_frag(entry->buf);
entry->buf = NULL;
}
- spin_unlock_bh(&q->lock);
if (!q->cache.va)
return;
@@ -347,8 +344,6 @@ int mtk_wed_wo_queue_tx_skb(struct mtk_w
int ret = 0, index;
u32 ctrl;
- spin_lock_bh(&q->lock);
-
q->tail = mtk_wed_mmio_r32(wo, q->regs.dma_idx);
index = (q->head + 1) % q->n_desc;
if (q->tail == index) {
@@ -379,8 +374,6 @@ int mtk_wed_wo_queue_tx_skb(struct mtk_w
mtk_wed_wo_queue_kick(wo, q, q->head);
mtk_wed_wo_kickout(wo);
out:
- spin_unlock_bh(&q->lock);
-
dev_kfree_skb(skb);
return ret;
--- a/drivers/net/ethernet/mediatek/mtk_wed_wo.h
+++ b/drivers/net/ethernet/mediatek/mtk_wed_wo.h
@@ -211,7 +211,6 @@ struct mtk_wed_wo_queue {
struct mtk_wed_wo_queue_regs regs;
struct page_frag_cache cache;
- spinlock_t lock;
struct mtk_wed_wo_queue_desc *desc;
dma_addr_t desc_dma;

View File

@ -1,70 +0,0 @@
From: Lorenzo Bianconi <lorenzo@kernel.org>
Date: Sat, 14 Jan 2023 18:01:28 +0100
Subject: [PATCH] net: ethernet: mtk_eth_soc: introduce mtk_hw_reset utility
routine
This is a preliminary patch to add Wireless Ethernet Dispatcher reset
support.
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Tested-by: Daniel Golle <daniel@makrotopia.org>
Co-developed-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -3256,6 +3256,27 @@ static void mtk_set_mcr_max_rx(struct mt
mtk_w32(mac->hw, mcr_new, MTK_MAC_MCR(mac->id));
}
+static void mtk_hw_reset(struct mtk_eth *eth)
+{
+ u32 val;
+
+ if (MTK_HAS_CAPS(eth->soc->caps, MTK_NETSYS_V2)) {
+ regmap_write(eth->ethsys, ETHSYS_FE_RST_CHK_IDLE_EN, 0);
+ val = RSTCTRL_PPE0_V2;
+ } else {
+ val = RSTCTRL_PPE0;
+ }
+
+ if (MTK_HAS_CAPS(eth->soc->caps, MTK_RSTCTRL_PPE1))
+ val |= RSTCTRL_PPE1;
+
+ ethsys_reset(eth, RSTCTRL_ETH | RSTCTRL_FE | val);
+
+ if (MTK_HAS_CAPS(eth->soc->caps, MTK_NETSYS_V2))
+ regmap_write(eth->ethsys, ETHSYS_FE_RST_CHK_IDLE_EN,
+ 0x3ffffff);
+}
+
static int mtk_hw_init(struct mtk_eth *eth)
{
u32 dma_mask = ETHSYS_DMA_AG_MAP_PDMA | ETHSYS_DMA_AG_MAP_QDMA |
@@ -3295,22 +3316,9 @@ static int mtk_hw_init(struct mtk_eth *e
return 0;
}
- if (MTK_HAS_CAPS(eth->soc->caps, MTK_NETSYS_V2)) {
- regmap_write(eth->ethsys, ETHSYS_FE_RST_CHK_IDLE_EN, 0);
- val = RSTCTRL_PPE0_V2;
- } else {
- val = RSTCTRL_PPE0;
- }
-
- if (MTK_HAS_CAPS(eth->soc->caps, MTK_RSTCTRL_PPE1))
- val |= RSTCTRL_PPE1;
-
- ethsys_reset(eth, RSTCTRL_ETH | RSTCTRL_FE | val);
+ mtk_hw_reset(eth);
if (MTK_HAS_CAPS(eth->soc->caps, MTK_NETSYS_V2)) {
- regmap_write(eth->ethsys, ETHSYS_FE_RST_CHK_IDLE_EN,
- 0x3ffffff);
-
/* Set FE to PDMAv2 if necessary */
val = mtk_r32(eth, MTK_FE_GLO_MISC);
mtk_w32(eth, val | BIT(4), MTK_FE_GLO_MISC);

View File

@ -1,107 +0,0 @@
From: Lorenzo Bianconi <lorenzo@kernel.org>
Date: Sat, 14 Jan 2023 18:01:29 +0100
Subject: [PATCH] net: ethernet: mtk_eth_soc: introduce mtk_hw_warm_reset
support
Introduce mtk_hw_warm_reset utility routine. This is a preliminary patch
to align reset procedure to vendor sdk and avoid to power down the chip
during hw reset.
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Tested-by: Daniel Golle <daniel@makrotopia.org>
Co-developed-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -3277,7 +3277,54 @@ static void mtk_hw_reset(struct mtk_eth
0x3ffffff);
}
-static int mtk_hw_init(struct mtk_eth *eth)
+static u32 mtk_hw_reset_read(struct mtk_eth *eth)
+{
+ u32 val;
+
+ regmap_read(eth->ethsys, ETHSYS_RSTCTRL, &val);
+ return val;
+}
+
+static void mtk_hw_warm_reset(struct mtk_eth *eth)
+{
+ u32 rst_mask, val;
+
+ regmap_update_bits(eth->ethsys, ETHSYS_RSTCTRL, RSTCTRL_FE,
+ RSTCTRL_FE);
+ if (readx_poll_timeout_atomic(mtk_hw_reset_read, eth, val,
+ val & RSTCTRL_FE, 1, 1000)) {
+ dev_err(eth->dev, "warm reset failed\n");
+ mtk_hw_reset(eth);
+ return;
+ }
+
+ if (MTK_HAS_CAPS(eth->soc->caps, MTK_NETSYS_V2))
+ rst_mask = RSTCTRL_ETH | RSTCTRL_PPE0_V2;
+ else
+ rst_mask = RSTCTRL_ETH | RSTCTRL_PPE0;
+
+ if (MTK_HAS_CAPS(eth->soc->caps, MTK_RSTCTRL_PPE1))
+ rst_mask |= RSTCTRL_PPE1;
+
+ regmap_update_bits(eth->ethsys, ETHSYS_RSTCTRL, rst_mask, rst_mask);
+
+ udelay(1);
+ val = mtk_hw_reset_read(eth);
+ if (!(val & rst_mask))
+ dev_err(eth->dev, "warm reset stage0 failed %08x (%08x)\n",
+ val, rst_mask);
+
+ rst_mask |= RSTCTRL_FE;
+ regmap_update_bits(eth->ethsys, ETHSYS_RSTCTRL, rst_mask, ~rst_mask);
+
+ udelay(1);
+ val = mtk_hw_reset_read(eth);
+ if (val & rst_mask)
+ dev_err(eth->dev, "warm reset stage1 failed %08x (%08x)\n",
+ val, rst_mask);
+}
+
+static int mtk_hw_init(struct mtk_eth *eth, bool reset)
{
u32 dma_mask = ETHSYS_DMA_AG_MAP_PDMA | ETHSYS_DMA_AG_MAP_QDMA |
ETHSYS_DMA_AG_MAP_PPE;
@@ -3316,7 +3363,12 @@ static int mtk_hw_init(struct mtk_eth *e
return 0;
}
- mtk_hw_reset(eth);
+ msleep(100);
+
+ if (reset)
+ mtk_hw_warm_reset(eth);
+ else
+ mtk_hw_reset(eth);
if (MTK_HAS_CAPS(eth->soc->caps, MTK_NETSYS_V2)) {
/* Set FE to PDMAv2 if necessary */
@@ -3507,7 +3559,7 @@ static void mtk_pending_work(struct work
if (eth->dev->pins)
pinctrl_select_state(eth->dev->pins->p,
eth->dev->pins->default_state);
- mtk_hw_init(eth);
+ mtk_hw_init(eth, true);
/* restart DMA and enable IRQs */
for (i = 0; i < MTK_MAC_COUNT; i++) {
@@ -4109,7 +4161,7 @@ static int mtk_probe(struct platform_dev
eth->msg_enable = netif_msg_init(mtk_msg_level, MTK_DEFAULT_MSG_ENABLE);
INIT_WORK(&eth->pending_work, mtk_pending_work);
- err = mtk_hw_init(eth);
+ err = mtk_hw_init(eth, false);
if (err)
goto err_wed_exit;

View File

@ -1,262 +0,0 @@
From: Lorenzo Bianconi <lorenzo@kernel.org>
Date: Sat, 14 Jan 2023 18:01:30 +0100
Subject: [PATCH] net: ethernet: mtk_eth_soc: align reset procedure to vendor
sdk
Avoid to power-down the ethernet chip during hw reset and align reset
procedure to vendor sdk.
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Tested-by: Daniel Golle <daniel@makrotopia.org>
Co-developed-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -2844,14 +2844,29 @@ static void mtk_dma_free(struct mtk_eth
kfree(eth->scratch_head);
}
+static bool mtk_hw_reset_check(struct mtk_eth *eth)
+{
+ u32 val = mtk_r32(eth, MTK_INT_STATUS2);
+
+ return (val & MTK_FE_INT_FQ_EMPTY) || (val & MTK_FE_INT_RFIFO_UF) ||
+ (val & MTK_FE_INT_RFIFO_OV) || (val & MTK_FE_INT_TSO_FAIL) ||
+ (val & MTK_FE_INT_TSO_ALIGN) || (val & MTK_FE_INT_TSO_ILLEGAL);
+}
+
static void mtk_tx_timeout(struct net_device *dev, unsigned int txqueue)
{
struct mtk_mac *mac = netdev_priv(dev);
struct mtk_eth *eth = mac->hw;
+ if (test_bit(MTK_RESETTING, &eth->state))
+ return;
+
+ if (!mtk_hw_reset_check(eth))
+ return;
+
eth->netdev[mac->id]->stats.tx_errors++;
- netif_err(eth, tx_err, dev,
- "transmit timed out\n");
+ netif_err(eth, tx_err, dev, "transmit timed out\n");
+
schedule_work(&eth->pending_work);
}
@@ -3331,15 +3346,17 @@ static int mtk_hw_init(struct mtk_eth *e
const struct mtk_reg_map *reg_map = eth->soc->reg_map;
int i, val, ret;
- if (test_and_set_bit(MTK_HW_INIT, &eth->state))
+ if (!reset && test_and_set_bit(MTK_HW_INIT, &eth->state))
return 0;
- pm_runtime_enable(eth->dev);
- pm_runtime_get_sync(eth->dev);
+ if (!reset) {
+ pm_runtime_enable(eth->dev);
+ pm_runtime_get_sync(eth->dev);
- ret = mtk_clk_enable(eth);
- if (ret)
- goto err_disable_pm;
+ ret = mtk_clk_enable(eth);
+ if (ret)
+ goto err_disable_pm;
+ }
if (eth->ethsys)
regmap_update_bits(eth->ethsys, ETHSYS_DMA_AG_MAP, dma_mask,
@@ -3468,8 +3485,10 @@ static int mtk_hw_init(struct mtk_eth *e
return 0;
err_disable_pm:
- pm_runtime_put_sync(eth->dev);
- pm_runtime_disable(eth->dev);
+ if (!reset) {
+ pm_runtime_put_sync(eth->dev);
+ pm_runtime_disable(eth->dev);
+ }
return ret;
}
@@ -3531,30 +3550,53 @@ static int mtk_do_ioctl(struct net_devic
return -EOPNOTSUPP;
}
+static void mtk_prepare_for_reset(struct mtk_eth *eth)
+{
+ u32 val;
+ int i;
+
+ /* disabe FE P3 and P4 */
+ val = mtk_r32(eth, MTK_FE_GLO_CFG) | MTK_FE_LINK_DOWN_P3;
+ if (MTK_HAS_CAPS(eth->soc->caps, MTK_RSTCTRL_PPE1))
+ val |= MTK_FE_LINK_DOWN_P4;
+ mtk_w32(eth, val, MTK_FE_GLO_CFG);
+
+ /* adjust PPE configurations to prepare for reset */
+ for (i = 0; i < ARRAY_SIZE(eth->ppe); i++)
+ mtk_ppe_prepare_reset(eth->ppe[i]);
+
+ /* disable NETSYS interrupts */
+ mtk_w32(eth, 0, MTK_FE_INT_ENABLE);
+
+ /* force link down GMAC */
+ for (i = 0; i < 2; i++) {
+ val = mtk_r32(eth, MTK_MAC_MCR(i)) & ~MAC_MCR_FORCE_LINK;
+ mtk_w32(eth, val, MTK_MAC_MCR(i));
+ }
+}
+
static void mtk_pending_work(struct work_struct *work)
{
struct mtk_eth *eth = container_of(work, struct mtk_eth, pending_work);
- int err, i;
unsigned long restart = 0;
+ u32 val;
+ int i;
rtnl_lock();
-
- dev_dbg(eth->dev, "[%s][%d] reset\n", __func__, __LINE__);
set_bit(MTK_RESETTING, &eth->state);
+ mtk_prepare_for_reset(eth);
+
/* stop all devices to make sure that dma is properly shut down */
for (i = 0; i < MTK_MAC_COUNT; i++) {
- if (!eth->netdev[i])
+ if (!eth->netdev[i] || !netif_running(eth->netdev[i]))
continue;
+
mtk_stop(eth->netdev[i]);
__set_bit(i, &restart);
}
- dev_dbg(eth->dev, "[%s][%d] mtk_stop ends\n", __func__, __LINE__);
- /* restart underlying hardware such as power, clock, pin mux
- * and the connected phy
- */
- mtk_hw_deinit(eth);
+ usleep_range(15000, 16000);
if (eth->dev->pins)
pinctrl_select_state(eth->dev->pins->p,
@@ -3565,15 +3607,19 @@ static void mtk_pending_work(struct work
for (i = 0; i < MTK_MAC_COUNT; i++) {
if (!test_bit(i, &restart))
continue;
- err = mtk_open(eth->netdev[i]);
- if (err) {
+
+ if (mtk_open(eth->netdev[i])) {
netif_alert(eth, ifup, eth->netdev[i],
- "Driver up/down cycle failed, closing device.\n");
+ "Driver up/down cycle failed\n");
dev_close(eth->netdev[i]);
}
}
- dev_dbg(eth->dev, "[%s][%d] reset done\n", __func__, __LINE__);
+ /* enabe FE P3 and P4 */
+ val = mtk_r32(eth, MTK_FE_GLO_CFG) & ~MTK_FE_LINK_DOWN_P3;
+ if (MTK_HAS_CAPS(eth->soc->caps, MTK_RSTCTRL_PPE1))
+ val &= ~MTK_FE_LINK_DOWN_P4;
+ mtk_w32(eth, val, MTK_FE_GLO_CFG);
clear_bit(MTK_RESETTING, &eth->state);
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.h
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
@@ -72,12 +72,24 @@
#define MTK_HW_LRO_REPLACE_DELTA 1000
#define MTK_HW_LRO_SDL_REMAIN_ROOM 1522
+/* Frame Engine Global Configuration */
+#define MTK_FE_GLO_CFG 0x00
+#define MTK_FE_LINK_DOWN_P3 BIT(11)
+#define MTK_FE_LINK_DOWN_P4 BIT(12)
+
/* Frame Engine Global Reset Register */
#define MTK_RST_GL 0x04
#define RST_GL_PSE BIT(0)
/* Frame Engine Interrupt Status Register */
#define MTK_INT_STATUS2 0x08
+#define MTK_FE_INT_ENABLE 0x0c
+#define MTK_FE_INT_FQ_EMPTY BIT(8)
+#define MTK_FE_INT_TSO_FAIL BIT(12)
+#define MTK_FE_INT_TSO_ILLEGAL BIT(13)
+#define MTK_FE_INT_TSO_ALIGN BIT(14)
+#define MTK_FE_INT_RFIFO_OV BIT(18)
+#define MTK_FE_INT_RFIFO_UF BIT(19)
#define MTK_GDM1_AF BIT(28)
#define MTK_GDM2_AF BIT(29)
--- a/drivers/net/ethernet/mediatek/mtk_ppe.c
+++ b/drivers/net/ethernet/mediatek/mtk_ppe.c
@@ -710,6 +710,33 @@ int mtk_foe_entry_idle_time(struct mtk_p
return __mtk_foe_entry_idle_time(ppe, entry->data.ib1);
}
+int mtk_ppe_prepare_reset(struct mtk_ppe *ppe)
+{
+ if (!ppe)
+ return -EINVAL;
+
+ /* disable KA */
+ ppe_clear(ppe, MTK_PPE_TB_CFG, MTK_PPE_TB_CFG_KEEPALIVE);
+ ppe_clear(ppe, MTK_PPE_BIND_LMT1, MTK_PPE_NTU_KEEPALIVE);
+ ppe_w32(ppe, MTK_PPE_KEEPALIVE, 0);
+ usleep_range(10000, 11000);
+
+ /* set KA timer to maximum */
+ ppe_set(ppe, MTK_PPE_BIND_LMT1, MTK_PPE_NTU_KEEPALIVE);
+ ppe_w32(ppe, MTK_PPE_KEEPALIVE, 0xffffffff);
+
+ /* set KA tick select */
+ ppe_set(ppe, MTK_PPE_TB_CFG, MTK_PPE_TB_TICK_SEL);
+ ppe_set(ppe, MTK_PPE_TB_CFG, MTK_PPE_TB_CFG_KEEPALIVE);
+ usleep_range(10000, 11000);
+
+ /* disable scan mode */
+ ppe_clear(ppe, MTK_PPE_TB_CFG, MTK_PPE_TB_CFG_SCAN_MODE);
+ usleep_range(10000, 11000);
+
+ return mtk_ppe_wait_busy(ppe);
+}
+
struct mtk_ppe *mtk_ppe_init(struct mtk_eth *eth, void __iomem *base,
int version, int index)
{
--- a/drivers/net/ethernet/mediatek/mtk_ppe.h
+++ b/drivers/net/ethernet/mediatek/mtk_ppe.h
@@ -306,6 +306,7 @@ struct mtk_ppe *mtk_ppe_init(struct mtk_
void mtk_ppe_deinit(struct mtk_eth *eth);
void mtk_ppe_start(struct mtk_ppe *ppe);
int mtk_ppe_stop(struct mtk_ppe *ppe);
+int mtk_ppe_prepare_reset(struct mtk_ppe *ppe);
void __mtk_ppe_check_skb(struct mtk_ppe *ppe, struct sk_buff *skb, u16 hash);
--- a/drivers/net/ethernet/mediatek/mtk_ppe_regs.h
+++ b/drivers/net/ethernet/mediatek/mtk_ppe_regs.h
@@ -58,6 +58,12 @@
#define MTK_PPE_TB_CFG_SCAN_MODE GENMASK(17, 16)
#define MTK_PPE_TB_CFG_HASH_DEBUG GENMASK(19, 18)
#define MTK_PPE_TB_CFG_INFO_SEL BIT(20)
+#define MTK_PPE_TB_TICK_SEL BIT(24)
+
+#define MTK_PPE_BIND_LMT1 0x230
+#define MTK_PPE_NTU_KEEPALIVE GENMASK(23, 16)
+
+#define MTK_PPE_KEEPALIVE 0x234
enum {
MTK_PPE_SCAN_MODE_DISABLED,

View File

@ -1,249 +0,0 @@
From: Lorenzo Bianconi <lorenzo@kernel.org>
Date: Sat, 14 Jan 2023 18:01:31 +0100
Subject: [PATCH] net: ethernet: mtk_eth_soc: add dma checks to
mtk_hw_reset_check
Introduce mtk_hw_check_dma_hang routine to monitor possible dma hangs.
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Tested-by: Daniel Golle <daniel@makrotopia.org>
Co-developed-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -50,6 +50,7 @@ static const struct mtk_reg_map mtk_reg_
.delay_irq = 0x0a0c,
.irq_status = 0x0a20,
.irq_mask = 0x0a28,
+ .adma_rx_dbg0 = 0x0a38,
.int_grp = 0x0a50,
},
.qdma = {
@@ -79,6 +80,8 @@ static const struct mtk_reg_map mtk_reg_
[0] = 0x2800,
[1] = 0x2c00,
},
+ .pse_iq_sta = 0x0110,
+ .pse_oq_sta = 0x0118,
};
static const struct mtk_reg_map mt7628_reg_map = {
@@ -109,6 +112,7 @@ static const struct mtk_reg_map mt7986_r
.delay_irq = 0x620c,
.irq_status = 0x6220,
.irq_mask = 0x6228,
+ .adma_rx_dbg0 = 0x6238,
.int_grp = 0x6250,
},
.qdma = {
@@ -138,6 +142,8 @@ static const struct mtk_reg_map mt7986_r
[0] = 0x4800,
[1] = 0x4c00,
},
+ .pse_iq_sta = 0x0180,
+ .pse_oq_sta = 0x01a0,
};
/* strings used by ethtool */
@@ -3339,6 +3345,102 @@ static void mtk_hw_warm_reset(struct mtk
val, rst_mask);
}
+static bool mtk_hw_check_dma_hang(struct mtk_eth *eth)
+{
+ const struct mtk_reg_map *reg_map = eth->soc->reg_map;
+ bool gmac1_tx, gmac2_tx, gdm1_tx, gdm2_tx;
+ bool oq_hang, cdm1_busy, adma_busy;
+ bool wtx_busy, cdm_full, oq_free;
+ u32 wdidx, val, gdm1_fc, gdm2_fc;
+ bool qfsm_hang, qfwd_hang;
+ bool ret = false;
+
+ if (MTK_HAS_CAPS(eth->soc->caps, MTK_SOC_MT7628))
+ return false;
+
+ /* WDMA sanity checks */
+ wdidx = mtk_r32(eth, reg_map->wdma_base[0] + 0xc);
+
+ val = mtk_r32(eth, reg_map->wdma_base[0] + 0x204);
+ wtx_busy = FIELD_GET(MTK_TX_DMA_BUSY, val);
+
+ val = mtk_r32(eth, reg_map->wdma_base[0] + 0x230);
+ cdm_full = !FIELD_GET(MTK_CDM_TXFIFO_RDY, val);
+
+ oq_free = (!(mtk_r32(eth, reg_map->pse_oq_sta) & GENMASK(24, 16)) &&
+ !(mtk_r32(eth, reg_map->pse_oq_sta + 0x4) & GENMASK(8, 0)) &&
+ !(mtk_r32(eth, reg_map->pse_oq_sta + 0x10) & GENMASK(24, 16)));
+
+ if (wdidx == eth->reset.wdidx && wtx_busy && cdm_full && oq_free) {
+ if (++eth->reset.wdma_hang_count > 2) {
+ eth->reset.wdma_hang_count = 0;
+ ret = true;
+ }
+ goto out;
+ }
+
+ /* QDMA sanity checks */
+ qfsm_hang = !!mtk_r32(eth, reg_map->qdma.qtx_cfg + 0x234);
+ qfwd_hang = !mtk_r32(eth, reg_map->qdma.qtx_cfg + 0x308);
+
+ gdm1_tx = FIELD_GET(GENMASK(31, 16), mtk_r32(eth, MTK_FE_GDM1_FSM)) > 0;
+ gdm2_tx = FIELD_GET(GENMASK(31, 16), mtk_r32(eth, MTK_FE_GDM2_FSM)) > 0;
+ gmac1_tx = FIELD_GET(GENMASK(31, 24), mtk_r32(eth, MTK_MAC_FSM(0))) != 1;
+ gmac2_tx = FIELD_GET(GENMASK(31, 24), mtk_r32(eth, MTK_MAC_FSM(1))) != 1;
+ gdm1_fc = mtk_r32(eth, reg_map->gdm1_cnt + 0x24);
+ gdm2_fc = mtk_r32(eth, reg_map->gdm1_cnt + 0x64);
+
+ if (qfsm_hang && qfwd_hang &&
+ ((gdm1_tx && gmac1_tx && gdm1_fc < 1) ||
+ (gdm2_tx && gmac2_tx && gdm2_fc < 1))) {
+ if (++eth->reset.qdma_hang_count > 2) {
+ eth->reset.qdma_hang_count = 0;
+ ret = true;
+ }
+ goto out;
+ }
+
+ /* ADMA sanity checks */
+ oq_hang = !!(mtk_r32(eth, reg_map->pse_oq_sta) & GENMASK(8, 0));
+ cdm1_busy = !!(mtk_r32(eth, MTK_FE_CDM1_FSM) & GENMASK(31, 16));
+ adma_busy = !(mtk_r32(eth, reg_map->pdma.adma_rx_dbg0) & GENMASK(4, 0)) &&
+ !(mtk_r32(eth, reg_map->pdma.adma_rx_dbg0) & BIT(6));
+
+ if (oq_hang && cdm1_busy && adma_busy) {
+ if (++eth->reset.adma_hang_count > 2) {
+ eth->reset.adma_hang_count = 0;
+ ret = true;
+ }
+ goto out;
+ }
+
+ eth->reset.wdma_hang_count = 0;
+ eth->reset.qdma_hang_count = 0;
+ eth->reset.adma_hang_count = 0;
+out:
+ eth->reset.wdidx = wdidx;
+
+ return ret;
+}
+
+static void mtk_hw_reset_monitor_work(struct work_struct *work)
+{
+ struct delayed_work *del_work = to_delayed_work(work);
+ struct mtk_eth *eth = container_of(del_work, struct mtk_eth,
+ reset.monitor_work);
+
+ if (test_bit(MTK_RESETTING, &eth->state))
+ goto out;
+
+ /* DMA stuck checks */
+ if (mtk_hw_check_dma_hang(eth))
+ schedule_work(&eth->pending_work);
+
+out:
+ schedule_delayed_work(&eth->reset.monitor_work,
+ MTK_DMA_MONITOR_TIMEOUT);
+}
+
static int mtk_hw_init(struct mtk_eth *eth, bool reset)
{
u32 dma_mask = ETHSYS_DMA_AG_MAP_PDMA | ETHSYS_DMA_AG_MAP_QDMA |
@@ -3657,6 +3759,7 @@ static int mtk_cleanup(struct mtk_eth *e
mtk_unreg_dev(eth);
mtk_free_dev(eth);
cancel_work_sync(&eth->pending_work);
+ cancel_delayed_work_sync(&eth->reset.monitor_work);
return 0;
}
@@ -4094,6 +4197,7 @@ static int mtk_probe(struct platform_dev
eth->rx_dim.mode = DIM_CQ_PERIOD_MODE_START_FROM_EQE;
INIT_WORK(&eth->rx_dim.work, mtk_dim_rx);
+ INIT_DELAYED_WORK(&eth->reset.monitor_work, mtk_hw_reset_monitor_work);
eth->tx_dim.mode = DIM_CQ_PERIOD_MODE_START_FROM_EQE;
INIT_WORK(&eth->tx_dim.work, mtk_dim_tx);
@@ -4296,6 +4400,8 @@ static int mtk_probe(struct platform_dev
netif_napi_add(&eth->dummy_dev, &eth->rx_napi, mtk_napi_rx);
platform_set_drvdata(pdev, eth);
+ schedule_delayed_work(&eth->reset.monitor_work,
+ MTK_DMA_MONITOR_TIMEOUT);
return 0;
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.h
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
@@ -257,6 +257,8 @@
#define MTK_RX_DONE_INT_V2 BIT(14)
+#define MTK_CDM_TXFIFO_RDY BIT(7)
+
/* QDMA Interrupt grouping registers */
#define MTK_RLS_DONE_INT BIT(0)
@@ -542,6 +544,17 @@
#define MT7628_SDM_RBCNT (MT7628_SDM_OFFSET + 0x10c)
#define MT7628_SDM_CS_ERR (MT7628_SDM_OFFSET + 0x110)
+#define MTK_FE_CDM1_FSM 0x220
+#define MTK_FE_CDM2_FSM 0x224
+#define MTK_FE_CDM3_FSM 0x238
+#define MTK_FE_CDM4_FSM 0x298
+#define MTK_FE_CDM5_FSM 0x318
+#define MTK_FE_CDM6_FSM 0x328
+#define MTK_FE_GDM1_FSM 0x228
+#define MTK_FE_GDM2_FSM 0x22C
+
+#define MTK_MAC_FSM(x) (0x1010C + ((x) * 0x100))
+
struct mtk_rx_dma {
unsigned int rxd1;
unsigned int rxd2;
@@ -938,6 +951,7 @@ struct mtk_reg_map {
u32 delay_irq; /* delay interrupt */
u32 irq_status; /* interrupt status */
u32 irq_mask; /* interrupt mask */
+ u32 adma_rx_dbg0;
u32 int_grp;
} pdma;
struct {
@@ -964,6 +978,8 @@ struct mtk_reg_map {
u32 gdma_to_ppe;
u32 ppe_base;
u32 wdma_base[2];
+ u32 pse_iq_sta;
+ u32 pse_oq_sta;
};
/* struct mtk_eth_data - This is the structure holding all differences
@@ -1006,6 +1022,8 @@ struct mtk_soc_data {
} txrx;
};
+#define MTK_DMA_MONITOR_TIMEOUT msecs_to_jiffies(1000)
+
/* currently no SoC has more than 2 macs */
#define MTK_MAX_DEVS 2
@@ -1128,6 +1146,14 @@ struct mtk_eth {
struct rhashtable flow_table;
struct bpf_prog __rcu *prog;
+
+ struct {
+ struct delayed_work monitor_work;
+ u32 wdidx;
+ u8 wdma_hang_count;
+ u8 qdma_hang_count;
+ u8 adma_hang_count;
+ } reset;
};
/* struct mtk_mac - the structure that holds the info about the MACs of the

View File

@ -1,124 +0,0 @@
From: Lorenzo Bianconi <lorenzo@kernel.org>
Date: Sat, 14 Jan 2023 18:01:32 +0100
Subject: [PATCH] net: ethernet: mtk_wed: add reset/reset_complete callbacks
Introduce reset and reset_complete wlan callback to schedule WLAN driver
reset when ethernet/wed driver is resetting.
Tested-by: Daniel Golle <daniel@makrotopia.org>
Co-developed-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -3688,6 +3688,11 @@ static void mtk_pending_work(struct work
set_bit(MTK_RESETTING, &eth->state);
mtk_prepare_for_reset(eth);
+ mtk_wed_fe_reset();
+ /* Run again reset preliminary configuration in order to avoid any
+ * possible race during FE reset since it can run releasing RTNL lock.
+ */
+ mtk_prepare_for_reset(eth);
/* stop all devices to make sure that dma is properly shut down */
for (i = 0; i < MTK_MAC_COUNT; i++) {
@@ -3725,6 +3730,8 @@ static void mtk_pending_work(struct work
clear_bit(MTK_RESETTING, &eth->state);
+ mtk_wed_fe_reset_complete();
+
rtnl_unlock();
}
--- a/drivers/net/ethernet/mediatek/mtk_wed.c
+++ b/drivers/net/ethernet/mediatek/mtk_wed.c
@@ -205,6 +205,48 @@ mtk_wed_wo_reset(struct mtk_wed_device *
iounmap(reg);
}
+void mtk_wed_fe_reset(void)
+{
+ int i;
+
+ mutex_lock(&hw_lock);
+
+ for (i = 0; i < ARRAY_SIZE(hw_list); i++) {
+ struct mtk_wed_hw *hw = hw_list[i];
+ struct mtk_wed_device *dev = hw->wed_dev;
+ int err;
+
+ if (!dev || !dev->wlan.reset)
+ continue;
+
+ /* reset callback blocks until WLAN reset is completed */
+ err = dev->wlan.reset(dev);
+ if (err)
+ dev_err(dev->dev, "wlan reset failed: %d\n", err);
+ }
+
+ mutex_unlock(&hw_lock);
+}
+
+void mtk_wed_fe_reset_complete(void)
+{
+ int i;
+
+ mutex_lock(&hw_lock);
+
+ for (i = 0; i < ARRAY_SIZE(hw_list); i++) {
+ struct mtk_wed_hw *hw = hw_list[i];
+ struct mtk_wed_device *dev = hw->wed_dev;
+
+ if (!dev || !dev->wlan.reset_complete)
+ continue;
+
+ dev->wlan.reset_complete(dev);
+ }
+
+ mutex_unlock(&hw_lock);
+}
+
static struct mtk_wed_hw *
mtk_wed_assign(struct mtk_wed_device *dev)
{
--- a/drivers/net/ethernet/mediatek/mtk_wed.h
+++ b/drivers/net/ethernet/mediatek/mtk_wed.h
@@ -128,6 +128,8 @@ void mtk_wed_add_hw(struct device_node *
void mtk_wed_exit(void);
int mtk_wed_flow_add(int index);
void mtk_wed_flow_remove(int index);
+void mtk_wed_fe_reset(void);
+void mtk_wed_fe_reset_complete(void);
#else
static inline void
mtk_wed_add_hw(struct device_node *np, struct mtk_eth *eth,
@@ -147,6 +149,13 @@ static inline void mtk_wed_flow_remove(i
{
}
+static inline void mtk_wed_fe_reset(void)
+{
+}
+
+static inline void mtk_wed_fe_reset_complete(void)
+{
+}
#endif
#ifdef CONFIG_DEBUG_FS
--- a/include/linux/soc/mediatek/mtk_wed.h
+++ b/include/linux/soc/mediatek/mtk_wed.h
@@ -151,6 +151,8 @@ struct mtk_wed_device {
void (*release_rx_buf)(struct mtk_wed_device *wed);
void (*update_wo_rx_stats)(struct mtk_wed_device *wed,
struct mtk_wed_wo_rx_stats *stats);
+ int (*reset)(struct mtk_wed_device *wed);
+ void (*reset_complete)(struct mtk_wed_device *wed);
} wlan;
#endif
};

View File

@ -1,106 +0,0 @@
From: Lorenzo Bianconi <lorenzo@kernel.org>
Date: Mon, 5 Dec 2022 12:34:42 +0100
Subject: [PATCH] net: ethernet: mtk_wed: add reset to rx_ring_setup callback
This patch adds reset parameter to mtk_wed_rx_ring_setup signature
in order to align rx_ring_setup callback to tx_ring_setup one introduced
in 'commit 23dca7a90017 ("net: ethernet: mtk_wed: add reset to
tx_ring_setup callback")'
Co-developed-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/29c6e7a5469e784406cf3e2920351d1207713d05.1670239984.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
--- a/drivers/net/ethernet/mediatek/mtk_wed.c
+++ b/drivers/net/ethernet/mediatek/mtk_wed.c
@@ -1252,7 +1252,8 @@ mtk_wed_wdma_rx_ring_setup(struct mtk_we
}
static int
-mtk_wed_wdma_tx_ring_setup(struct mtk_wed_device *dev, int idx, int size)
+mtk_wed_wdma_tx_ring_setup(struct mtk_wed_device *dev, int idx, int size,
+ bool reset)
{
u32 desc_size = sizeof(struct mtk_wdma_desc) * dev->hw->version;
struct mtk_wed_ring *wdma;
@@ -1261,8 +1262,8 @@ mtk_wed_wdma_tx_ring_setup(struct mtk_we
return -EINVAL;
wdma = &dev->tx_wdma[idx];
- if (mtk_wed_ring_alloc(dev, wdma, MTK_WED_WDMA_RING_SIZE, desc_size,
- true))
+ if (!reset && mtk_wed_ring_alloc(dev, wdma, MTK_WED_WDMA_RING_SIZE,
+ desc_size, true))
return -ENOMEM;
wdma_w32(dev, MTK_WDMA_RING_TX(idx) + MTK_WED_RING_OFS_BASE,
@@ -1272,6 +1273,9 @@ mtk_wed_wdma_tx_ring_setup(struct mtk_we
wdma_w32(dev, MTK_WDMA_RING_TX(idx) + MTK_WED_RING_OFS_CPU_IDX, 0);
wdma_w32(dev, MTK_WDMA_RING_TX(idx) + MTK_WED_RING_OFS_DMA_IDX, 0);
+ if (reset)
+ mtk_wed_ring_reset(wdma, MTK_WED_WDMA_RING_SIZE, true);
+
if (!idx) {
wed_w32(dev, MTK_WED_WDMA_RING_TX + MTK_WED_RING_OFS_BASE,
wdma->desc_phys);
@@ -1611,18 +1615,20 @@ mtk_wed_txfree_ring_setup(struct mtk_wed
}
static int
-mtk_wed_rx_ring_setup(struct mtk_wed_device *dev, int idx, void __iomem *regs)
+mtk_wed_rx_ring_setup(struct mtk_wed_device *dev, int idx, void __iomem *regs,
+ bool reset)
{
struct mtk_wed_ring *ring = &dev->rx_ring[idx];
if (WARN_ON(idx >= ARRAY_SIZE(dev->rx_ring)))
return -EINVAL;
- if (mtk_wed_ring_alloc(dev, ring, MTK_WED_RX_RING_SIZE,
- sizeof(*ring->desc), false))
+ if (!reset && mtk_wed_ring_alloc(dev, ring, MTK_WED_RX_RING_SIZE,
+ sizeof(*ring->desc), false))
return -ENOMEM;
- if (mtk_wed_wdma_tx_ring_setup(dev, idx, MTK_WED_WDMA_RING_SIZE))
+ if (mtk_wed_wdma_tx_ring_setup(dev, idx, MTK_WED_WDMA_RING_SIZE,
+ reset))
return -ENOMEM;
ring->reg_base = MTK_WED_RING_RX_DATA(idx);
--- a/include/linux/soc/mediatek/mtk_wed.h
+++ b/include/linux/soc/mediatek/mtk_wed.h
@@ -162,7 +162,7 @@ struct mtk_wed_ops {
int (*tx_ring_setup)(struct mtk_wed_device *dev, int ring,
void __iomem *regs, bool reset);
int (*rx_ring_setup)(struct mtk_wed_device *dev, int ring,
- void __iomem *regs);
+ void __iomem *regs, bool reset);
int (*txfree_ring_setup)(struct mtk_wed_device *dev,
void __iomem *regs);
int (*msg_update)(struct mtk_wed_device *dev, int cmd_id,
@@ -230,8 +230,8 @@ mtk_wed_get_rx_capa(struct mtk_wed_devic
(_dev)->ops->irq_get(_dev, _mask)
#define mtk_wed_device_irq_set_mask(_dev, _mask) \
(_dev)->ops->irq_set_mask(_dev, _mask)
-#define mtk_wed_device_rx_ring_setup(_dev, _ring, _regs) \
- (_dev)->ops->rx_ring_setup(_dev, _ring, _regs)
+#define mtk_wed_device_rx_ring_setup(_dev, _ring, _regs, _reset) \
+ (_dev)->ops->rx_ring_setup(_dev, _ring, _regs, _reset)
#define mtk_wed_device_ppe_check(_dev, _skb, _reason, _hash) \
(_dev)->ops->ppe_check(_dev, _skb, _reason, _hash)
#define mtk_wed_device_update_msg(_dev, _id, _msg, _len) \
@@ -251,7 +251,7 @@ static inline bool mtk_wed_device_active
#define mtk_wed_device_reg_write(_dev, _reg, _val) do {} while (0)
#define mtk_wed_device_irq_get(_dev, _mask) 0
#define mtk_wed_device_irq_set_mask(_dev, _mask) do {} while (0)
-#define mtk_wed_device_rx_ring_setup(_dev, _ring, _regs) -ENODEV
+#define mtk_wed_device_rx_ring_setup(_dev, _ring, _regs, _reset) -ENODEV
#define mtk_wed_device_ppe_check(_dev, _skb, _reason, _hash) do {} while (0)
#define mtk_wed_device_update_msg(_dev, _id, _msg, _len) -ENODEV
#define mtk_wed_device_stop(_dev) do {} while (0)

View File

@ -1,22 +0,0 @@
From: Felix Fietkau <nbd@nbd.name>
Date: Thu, 27 Oct 2022 19:50:31 +0200
Subject: [PATCH] net: ethernet: mtk_eth_soc: account for vlan in rx
header length
The network stack assumes that devices can handle an extra VLAN tag without
increasing the MTU
Signed-off-by: Felix Fietkau <nbd@nbd.name>
---
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.h
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
@@ -29,7 +29,7 @@
#define MTK_TX_DMA_BUF_LEN_V2 0xffff
#define MTK_DMA_SIZE 512
#define MTK_MAC_COUNT 2
-#define MTK_RX_ETH_HLEN (ETH_HLEN + ETH_FCS_LEN)
+#define MTK_RX_ETH_HLEN (VLAN_ETH_HLEN + ETH_FCS_LEN)
#define MTK_RX_HLEN (NET_SKB_PAD + MTK_RX_ETH_HLEN + NET_IP_ALIGN)
#define MTK_DMA_DUMMY_DESC 0xffffffff
#define MTK_DEFAULT_MSG_ENABLE (NETIF_MSG_DRV | \

View File

@ -1,143 +0,0 @@
From: Felix Fietkau <nbd@nbd.name>
Date: Thu, 27 Oct 2022 19:53:57 +0200
Subject: [PATCH] net: ethernet: mtk_eth_soc: increase tx ring side for
QDMA devices
In order to use the hardware traffic shaper feature, a larger tx ring is
needed, especially for the scratch ring, which the hardware shaper uses to
reorder packets.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
---
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -944,7 +944,7 @@ static int mtk_init_fq_dma(struct mtk_et
{
const struct mtk_soc_data *soc = eth->soc;
dma_addr_t phy_ring_tail;
- int cnt = MTK_DMA_SIZE;
+ int cnt = MTK_QDMA_RING_SIZE;
dma_addr_t dma_addr;
int i;
@@ -2208,19 +2208,25 @@ static int mtk_tx_alloc(struct mtk_eth *
struct mtk_tx_ring *ring = &eth->tx_ring;
int i, sz = soc->txrx.txd_size;
struct mtk_tx_dma_v2 *txd;
+ int ring_size;
- ring->buf = kcalloc(MTK_DMA_SIZE, sizeof(*ring->buf),
+ if (MTK_HAS_CAPS(soc->caps, MTK_QDMA))
+ ring_size = MTK_QDMA_RING_SIZE;
+ else
+ ring_size = MTK_DMA_SIZE;
+
+ ring->buf = kcalloc(ring_size, sizeof(*ring->buf),
GFP_KERNEL);
if (!ring->buf)
goto no_tx_mem;
- ring->dma = dma_alloc_coherent(eth->dma_dev, MTK_DMA_SIZE * sz,
+ ring->dma = dma_alloc_coherent(eth->dma_dev, ring_size * sz,
&ring->phys, GFP_KERNEL);
if (!ring->dma)
goto no_tx_mem;
- for (i = 0; i < MTK_DMA_SIZE; i++) {
- int next = (i + 1) % MTK_DMA_SIZE;
+ for (i = 0; i < ring_size; i++) {
+ int next = (i + 1) % ring_size;
u32 next_ptr = ring->phys + next * sz;
txd = ring->dma + i * sz;
@@ -2240,22 +2246,22 @@ static int mtk_tx_alloc(struct mtk_eth *
* descriptors in ring->dma_pdma.
*/
if (!MTK_HAS_CAPS(soc->caps, MTK_QDMA)) {
- ring->dma_pdma = dma_alloc_coherent(eth->dma_dev, MTK_DMA_SIZE * sz,
+ ring->dma_pdma = dma_alloc_coherent(eth->dma_dev, ring_size * sz,
&ring->phys_pdma, GFP_KERNEL);
if (!ring->dma_pdma)
goto no_tx_mem;
- for (i = 0; i < MTK_DMA_SIZE; i++) {
+ for (i = 0; i < ring_size; i++) {
ring->dma_pdma[i].txd2 = TX_DMA_DESP2_DEF;
ring->dma_pdma[i].txd4 = 0;
}
}
- ring->dma_size = MTK_DMA_SIZE;
- atomic_set(&ring->free_count, MTK_DMA_SIZE - 2);
+ ring->dma_size = ring_size;
+ atomic_set(&ring->free_count, ring_size - 2);
ring->next_free = ring->dma;
ring->last_free = (void *)txd;
- ring->last_free_ptr = (u32)(ring->phys + ((MTK_DMA_SIZE - 1) * sz));
+ ring->last_free_ptr = (u32)(ring->phys + ((ring_size - 1) * sz));
ring->thresh = MAX_SKB_FRAGS;
/* make sure that all changes to the dma ring are flushed before we
@@ -2267,14 +2273,14 @@ static int mtk_tx_alloc(struct mtk_eth *
mtk_w32(eth, ring->phys, soc->reg_map->qdma.ctx_ptr);
mtk_w32(eth, ring->phys, soc->reg_map->qdma.dtx_ptr);
mtk_w32(eth,
- ring->phys + ((MTK_DMA_SIZE - 1) * sz),
+ ring->phys + ((ring_size - 1) * sz),
soc->reg_map->qdma.crx_ptr);
mtk_w32(eth, ring->last_free_ptr, soc->reg_map->qdma.drx_ptr);
mtk_w32(eth, (QDMA_RES_THRES << 8) | QDMA_RES_THRES,
soc->reg_map->qdma.qtx_cfg);
} else {
mtk_w32(eth, ring->phys_pdma, MT7628_TX_BASE_PTR0);
- mtk_w32(eth, MTK_DMA_SIZE, MT7628_TX_MAX_CNT0);
+ mtk_w32(eth, ring_size, MT7628_TX_MAX_CNT0);
mtk_w32(eth, 0, MT7628_TX_CTX_IDX0);
mtk_w32(eth, MT7628_PST_DTX_IDX0, soc->reg_map->pdma.rst_idx);
}
@@ -2292,7 +2298,7 @@ static void mtk_tx_clean(struct mtk_eth
int i;
if (ring->buf) {
- for (i = 0; i < MTK_DMA_SIZE; i++)
+ for (i = 0; i < ring->dma_size; i++)
mtk_tx_unmap(eth, &ring->buf[i], NULL, false);
kfree(ring->buf);
ring->buf = NULL;
@@ -2300,14 +2306,14 @@ static void mtk_tx_clean(struct mtk_eth
if (ring->dma) {
dma_free_coherent(eth->dma_dev,
- MTK_DMA_SIZE * soc->txrx.txd_size,
+ ring->dma_size * soc->txrx.txd_size,
ring->dma, ring->phys);
ring->dma = NULL;
}
if (ring->dma_pdma) {
dma_free_coherent(eth->dma_dev,
- MTK_DMA_SIZE * soc->txrx.txd_size,
+ ring->dma_size * soc->txrx.txd_size,
ring->dma_pdma, ring->phys_pdma);
ring->dma_pdma = NULL;
}
@@ -2832,7 +2838,7 @@ static void mtk_dma_free(struct mtk_eth
netdev_reset_queue(eth->netdev[i]);
if (eth->scratch_ring) {
dma_free_coherent(eth->dma_dev,
- MTK_DMA_SIZE * soc->txrx.txd_size,
+ MTK_QDMA_RING_SIZE * soc->txrx.txd_size,
eth->scratch_ring, eth->phy_scratch_ring);
eth->scratch_ring = NULL;
eth->phy_scratch_ring = 0;
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.h
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
@@ -27,6 +27,7 @@
#define MTK_MAX_RX_LENGTH_2K 2048
#define MTK_TX_DMA_BUF_LEN 0x3fff
#define MTK_TX_DMA_BUF_LEN_V2 0xffff
+#define MTK_QDMA_RING_SIZE 2048
#define MTK_DMA_SIZE 512
#define MTK_MAC_COUNT 2
#define MTK_RX_ETH_HLEN (VLAN_ETH_HLEN + ETH_FCS_LEN)

View File

@ -1,52 +0,0 @@
From: Felix Fietkau <nbd@nbd.name>
Date: Fri, 4 Nov 2022 19:49:08 +0100
Subject: [PATCH] net: ethernet: mtk_eth_soc: avoid port_mg assignment on
MT7622 and newer
On newer chips, this field is unused and contains some bits related to queue
assignment. Initialize it to 0 in those cases.
Fix offload_version on MT7621 and MT7623, which still need the previous value.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
---
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -4479,7 +4479,7 @@ static const struct mtk_soc_data mt7621_
.hw_features = MTK_HW_FEATURES,
.required_clks = MT7621_CLKS_BITMAP,
.required_pctl = false,
- .offload_version = 2,
+ .offload_version = 1,
.hash_offset = 2,
.foe_entry_size = sizeof(struct mtk_foe_entry) - 16,
.txrx = {
@@ -4518,7 +4518,7 @@ static const struct mtk_soc_data mt7623_
.hw_features = MTK_HW_FEATURES,
.required_clks = MT7623_CLKS_BITMAP,
.required_pctl = true,
- .offload_version = 2,
+ .offload_version = 1,
.hash_offset = 2,
.foe_entry_size = sizeof(struct mtk_foe_entry) - 16,
.txrx = {
--- a/drivers/net/ethernet/mediatek/mtk_ppe.c
+++ b/drivers/net/ethernet/mediatek/mtk_ppe.c
@@ -175,6 +175,8 @@ int mtk_foe_entry_prepare(struct mtk_eth
val = FIELD_PREP(MTK_FOE_IB2_DEST_PORT_V2, pse_port) |
FIELD_PREP(MTK_FOE_IB2_PORT_AG_V2, 0xf);
} else {
+ int port_mg = eth->soc->offload_version > 1 ? 0 : 0x3f;
+
val = FIELD_PREP(MTK_FOE_IB1_STATE, MTK_FOE_STATE_BIND) |
FIELD_PREP(MTK_FOE_IB1_PACKET_TYPE, type) |
FIELD_PREP(MTK_FOE_IB1_UDP, l4proto == IPPROTO_UDP) |
@@ -182,7 +184,7 @@ int mtk_foe_entry_prepare(struct mtk_eth
entry->ib1 = val;
val = FIELD_PREP(MTK_FOE_IB2_DEST_PORT, pse_port) |
- FIELD_PREP(MTK_FOE_IB2_PORT_MG, 0x3f) |
+ FIELD_PREP(MTK_FOE_IB2_PORT_MG, port_mg) |
FIELD_PREP(MTK_FOE_IB2_PORT_AG, 0x1f);
}

View File

@ -1,654 +0,0 @@
From: Felix Fietkau <nbd@nbd.name>
Date: Thu, 27 Oct 2022 20:17:27 +0200
Subject: [PATCH] net: ethernet: mtk_eth_soc: implement multi-queue
support for per-port queues
When sending traffic to multiple ports with different link speeds, queued
packets to one port can drown out tx to other ports.
In order to better handle transmission to multiple ports, use the hardware
shaper feature to implement weighted fair queueing between ports.
Weight and maximum rate are automatically adjusted based on the link speed
of the port.
The first 3 queues are unrestricted and reserved for non-DSA direct tx on
GMAC ports. The following queues are automatically assigned by the MTK DSA
tag driver based on the target port number.
The PPE offload code configures the queues for offloaded traffic in the same
way.
This feature is only supported on devices supporting QDMA. All queues still
share the same DMA ring and descriptor pool.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
---
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -55,6 +55,7 @@ static const struct mtk_reg_map mtk_reg_
},
.qdma = {
.qtx_cfg = 0x1800,
+ .qtx_sch = 0x1804,
.rx_ptr = 0x1900,
.rx_cnt_cfg = 0x1904,
.qcrx_ptr = 0x1908,
@@ -62,6 +63,7 @@ static const struct mtk_reg_map mtk_reg_
.rst_idx = 0x1a08,
.delay_irq = 0x1a0c,
.fc_th = 0x1a10,
+ .tx_sch_rate = 0x1a14,
.int_grp = 0x1a20,
.hred = 0x1a44,
.ctx_ptr = 0x1b00,
@@ -117,6 +119,7 @@ static const struct mtk_reg_map mt7986_r
},
.qdma = {
.qtx_cfg = 0x4400,
+ .qtx_sch = 0x4404,
.rx_ptr = 0x4500,
.rx_cnt_cfg = 0x4504,
.qcrx_ptr = 0x4508,
@@ -134,6 +137,7 @@ static const struct mtk_reg_map mt7986_r
.fq_tail = 0x4724,
.fq_count = 0x4728,
.fq_blen = 0x472c,
+ .tx_sch_rate = 0x4798,
},
.gdm1_cnt = 0x1c00,
.gdma_to_ppe = 0x3333,
@@ -620,6 +624,75 @@ static void mtk_mac_link_down(struct phy
mtk_w32(mac->hw, mcr, MTK_MAC_MCR(mac->id));
}
+static void mtk_set_queue_speed(struct mtk_eth *eth, unsigned int idx,
+ int speed)
+{
+ const struct mtk_soc_data *soc = eth->soc;
+ u32 ofs, val;
+
+ if (!MTK_HAS_CAPS(soc->caps, MTK_QDMA))
+ return;
+
+ val = MTK_QTX_SCH_MIN_RATE_EN |
+ /* minimum: 10 Mbps */
+ FIELD_PREP(MTK_QTX_SCH_MIN_RATE_MAN, 1) |
+ FIELD_PREP(MTK_QTX_SCH_MIN_RATE_EXP, 4) |
+ MTK_QTX_SCH_LEAKY_BUCKET_SIZE;
+ if (!MTK_HAS_CAPS(eth->soc->caps, MTK_NETSYS_V2))
+ val |= MTK_QTX_SCH_LEAKY_BUCKET_EN;
+
+ if (IS_ENABLED(CONFIG_SOC_MT7621)) {
+ switch (speed) {
+ case SPEED_10:
+ val |= MTK_QTX_SCH_MAX_RATE_EN |
+ FIELD_PREP(MTK_QTX_SCH_MAX_RATE_MAN, 103) |
+ FIELD_PREP(MTK_QTX_SCH_MAX_RATE_EXP, 2) |
+ FIELD_PREP(MTK_QTX_SCH_MAX_RATE_WEIGHT, 1);
+ break;
+ case SPEED_100:
+ val |= MTK_QTX_SCH_MAX_RATE_EN |
+ FIELD_PREP(MTK_QTX_SCH_MAX_RATE_MAN, 103) |
+ FIELD_PREP(MTK_QTX_SCH_MAX_RATE_EXP, 3);
+ FIELD_PREP(MTK_QTX_SCH_MAX_RATE_WEIGHT, 1);
+ break;
+ case SPEED_1000:
+ val |= MTK_QTX_SCH_MAX_RATE_EN |
+ FIELD_PREP(MTK_QTX_SCH_MAX_RATE_MAN, 105) |
+ FIELD_PREP(MTK_QTX_SCH_MAX_RATE_EXP, 4) |
+ FIELD_PREP(MTK_QTX_SCH_MAX_RATE_WEIGHT, 10);
+ break;
+ default:
+ break;
+ }
+ } else {
+ switch (speed) {
+ case SPEED_10:
+ val |= MTK_QTX_SCH_MAX_RATE_EN |
+ FIELD_PREP(MTK_QTX_SCH_MAX_RATE_MAN, 1) |
+ FIELD_PREP(MTK_QTX_SCH_MAX_RATE_EXP, 4) |
+ FIELD_PREP(MTK_QTX_SCH_MAX_RATE_WEIGHT, 1);
+ break;
+ case SPEED_100:
+ val |= MTK_QTX_SCH_MAX_RATE_EN |
+ FIELD_PREP(MTK_QTX_SCH_MAX_RATE_MAN, 1) |
+ FIELD_PREP(MTK_QTX_SCH_MAX_RATE_EXP, 5);
+ FIELD_PREP(MTK_QTX_SCH_MAX_RATE_WEIGHT, 1);
+ break;
+ case SPEED_1000:
+ val |= MTK_QTX_SCH_MAX_RATE_EN |
+ FIELD_PREP(MTK_QTX_SCH_MAX_RATE_MAN, 10) |
+ FIELD_PREP(MTK_QTX_SCH_MAX_RATE_EXP, 5) |
+ FIELD_PREP(MTK_QTX_SCH_MAX_RATE_WEIGHT, 10);
+ break;
+ default:
+ break;
+ }
+ }
+
+ ofs = MTK_QTX_OFFSET * idx;
+ mtk_w32(eth, val, soc->reg_map->qdma.qtx_sch + ofs);
+}
+
static void mtk_mac_link_up(struct phylink_config *config,
struct phy_device *phy,
unsigned int mode, phy_interface_t interface,
@@ -645,6 +718,8 @@ static void mtk_mac_link_up(struct phyli
break;
}
+ mtk_set_queue_speed(mac->hw, mac->id, speed);
+
/* Configure duplex */
if (duplex == DUPLEX_FULL)
mcr |= MAC_MCR_FORCE_DPX;
@@ -1105,7 +1180,8 @@ static void mtk_tx_set_dma_desc_v1(struc
WRITE_ONCE(desc->txd1, info->addr);
- data = TX_DMA_SWC | TX_DMA_PLEN0(info->size);
+ data = TX_DMA_SWC | TX_DMA_PLEN0(info->size) |
+ FIELD_PREP(TX_DMA_PQID, info->qid);
if (info->last)
data |= TX_DMA_LS0;
WRITE_ONCE(desc->txd3, data);
@@ -1139,9 +1215,6 @@ static void mtk_tx_set_dma_desc_v2(struc
data |= TX_DMA_LS0;
WRITE_ONCE(desc->txd3, data);
- if (!info->qid && mac->id)
- info->qid = MTK_QDMA_GMAC2_QID;
-
data = (mac->id + 1) << TX_DMA_FPORT_SHIFT_V2; /* forward port */
data |= TX_DMA_SWC_V2 | QID_BITS_V2(info->qid);
WRITE_ONCE(desc->txd4, data);
@@ -1185,11 +1258,12 @@ static int mtk_tx_map(struct sk_buff *sk
.gso = gso,
.csum = skb->ip_summed == CHECKSUM_PARTIAL,
.vlan = skb_vlan_tag_present(skb),
- .qid = skb->mark & MTK_QDMA_TX_MASK,
+ .qid = skb_get_queue_mapping(skb),
.vlan_tci = skb_vlan_tag_get(skb),
.first = true,
.last = !skb_is_nonlinear(skb),
};
+ struct netdev_queue *txq;
struct mtk_mac *mac = netdev_priv(dev);
struct mtk_eth *eth = mac->hw;
const struct mtk_soc_data *soc = eth->soc;
@@ -1197,8 +1271,10 @@ static int mtk_tx_map(struct sk_buff *sk
struct mtk_tx_dma *itxd_pdma, *txd_pdma;
struct mtk_tx_buf *itx_buf, *tx_buf;
int i, n_desc = 1;
+ int queue = skb_get_queue_mapping(skb);
int k = 0;
+ txq = netdev_get_tx_queue(dev, queue);
itxd = ring->next_free;
itxd_pdma = qdma_to_pdma(ring, itxd);
if (itxd == ring->last_free)
@@ -1247,7 +1323,7 @@ static int mtk_tx_map(struct sk_buff *sk
memset(&txd_info, 0, sizeof(struct mtk_tx_dma_desc_info));
txd_info.size = min_t(unsigned int, frag_size,
soc->txrx.dma_max_len);
- txd_info.qid = skb->mark & MTK_QDMA_TX_MASK;
+ txd_info.qid = queue;
txd_info.last = i == skb_shinfo(skb)->nr_frags - 1 &&
!(frag_size - txd_info.size);
txd_info.addr = skb_frag_dma_map(eth->dma_dev, frag,
@@ -1286,7 +1362,7 @@ static int mtk_tx_map(struct sk_buff *sk
txd_pdma->txd2 |= TX_DMA_LS1;
}
- netdev_sent_queue(dev, skb->len);
+ netdev_tx_sent_queue(txq, skb->len);
skb_tx_timestamp(skb);
ring->next_free = mtk_qdma_phys_to_virt(ring, txd->txd2);
@@ -1298,8 +1374,7 @@ static int mtk_tx_map(struct sk_buff *sk
wmb();
if (MTK_HAS_CAPS(soc->caps, MTK_QDMA)) {
- if (netif_xmit_stopped(netdev_get_tx_queue(dev, 0)) ||
- !netdev_xmit_more())
+ if (netif_xmit_stopped(txq) || !netdev_xmit_more())
mtk_w32(eth, txd->txd2, soc->reg_map->qdma.ctx_ptr);
} else {
int next_idx;
@@ -1368,7 +1443,7 @@ static void mtk_wake_queue(struct mtk_et
for (i = 0; i < MTK_MAC_COUNT; i++) {
if (!eth->netdev[i])
continue;
- netif_wake_queue(eth->netdev[i]);
+ netif_tx_wake_all_queues(eth->netdev[i]);
}
}
@@ -1392,7 +1467,7 @@ static netdev_tx_t mtk_start_xmit(struct
tx_num = mtk_cal_txd_req(eth, skb);
if (unlikely(atomic_read(&ring->free_count) <= tx_num)) {
- netif_stop_queue(dev);
+ netif_tx_stop_all_queues(dev);
netif_err(eth, tx_queued, dev,
"Tx Ring full when queue awake!\n");
spin_unlock(&eth->page_lock);
@@ -1418,7 +1493,7 @@ static netdev_tx_t mtk_start_xmit(struct
goto drop;
if (unlikely(atomic_read(&ring->free_count) <= ring->thresh))
- netif_stop_queue(dev);
+ netif_tx_stop_all_queues(dev);
spin_unlock(&eth->page_lock);
@@ -1585,10 +1660,12 @@ static int mtk_xdp_submit_frame(struct m
struct skb_shared_info *sinfo = xdp_get_shared_info_from_frame(xdpf);
const struct mtk_soc_data *soc = eth->soc;
struct mtk_tx_ring *ring = &eth->tx_ring;
+ struct mtk_mac *mac = netdev_priv(dev);
struct mtk_tx_dma_desc_info txd_info = {
.size = xdpf->len,
.first = true,
.last = !xdp_frame_has_frags(xdpf),
+ .qid = mac->id,
};
int err, index = 0, n_desc = 1, nr_frags;
struct mtk_tx_buf *htx_buf, *tx_buf;
@@ -1638,6 +1715,7 @@ static int mtk_xdp_submit_frame(struct m
memset(&txd_info, 0, sizeof(struct mtk_tx_dma_desc_info));
txd_info.size = skb_frag_size(&sinfo->frags[index]);
txd_info.last = index + 1 == nr_frags;
+ txd_info.qid = mac->id;
data = skb_frag_address(&sinfo->frags[index]);
index++;
@@ -1992,8 +2070,46 @@ rx_done:
return done;
}
+struct mtk_poll_state {
+ struct netdev_queue *txq;
+ unsigned int total;
+ unsigned int done;
+ unsigned int bytes;
+};
+
+static void
+mtk_poll_tx_done(struct mtk_eth *eth, struct mtk_poll_state *state, u8 mac,
+ struct sk_buff *skb)
+{
+ struct netdev_queue *txq;
+ struct net_device *dev;
+ unsigned int bytes = skb->len;
+
+ state->total++;
+ eth->tx_packets++;
+ eth->tx_bytes += bytes;
+
+ dev = eth->netdev[mac];
+ if (!dev)
+ return;
+
+ txq = netdev_get_tx_queue(dev, skb_get_queue_mapping(skb));
+ if (state->txq == txq) {
+ state->done++;
+ state->bytes += bytes;
+ return;
+ }
+
+ if (state->txq)
+ netdev_tx_completed_queue(state->txq, state->done, state->bytes);
+
+ state->txq = txq;
+ state->done = 1;
+ state->bytes = bytes;
+}
+
static int mtk_poll_tx_qdma(struct mtk_eth *eth, int budget,
- unsigned int *done, unsigned int *bytes)
+ struct mtk_poll_state *state)
{
const struct mtk_reg_map *reg_map = eth->soc->reg_map;
struct mtk_tx_ring *ring = &eth->tx_ring;
@@ -2025,12 +2141,9 @@ static int mtk_poll_tx_qdma(struct mtk_e
break;
if (tx_buf->data != (void *)MTK_DMA_DUMMY_DESC) {
- if (tx_buf->type == MTK_TYPE_SKB) {
- struct sk_buff *skb = tx_buf->data;
+ if (tx_buf->type == MTK_TYPE_SKB)
+ mtk_poll_tx_done(eth, state, mac, tx_buf->data);
- bytes[mac] += skb->len;
- done[mac]++;
- }
budget--;
}
mtk_tx_unmap(eth, tx_buf, &bq, true);
@@ -2049,7 +2162,7 @@ static int mtk_poll_tx_qdma(struct mtk_e
}
static int mtk_poll_tx_pdma(struct mtk_eth *eth, int budget,
- unsigned int *done, unsigned int *bytes)
+ struct mtk_poll_state *state)
{
struct mtk_tx_ring *ring = &eth->tx_ring;
struct mtk_tx_buf *tx_buf;
@@ -2067,12 +2180,8 @@ static int mtk_poll_tx_pdma(struct mtk_e
break;
if (tx_buf->data != (void *)MTK_DMA_DUMMY_DESC) {
- if (tx_buf->type == MTK_TYPE_SKB) {
- struct sk_buff *skb = tx_buf->data;
-
- bytes[0] += skb->len;
- done[0]++;
- }
+ if (tx_buf->type == MTK_TYPE_SKB)
+ mtk_poll_tx_done(eth, state, 0, tx_buf->data);
budget--;
}
mtk_tx_unmap(eth, tx_buf, &bq, true);
@@ -2094,26 +2203,15 @@ static int mtk_poll_tx(struct mtk_eth *e
{
struct mtk_tx_ring *ring = &eth->tx_ring;
struct dim_sample dim_sample = {};
- unsigned int done[MTK_MAX_DEVS];
- unsigned int bytes[MTK_MAX_DEVS];
- int total = 0, i;
-
- memset(done, 0, sizeof(done));
- memset(bytes, 0, sizeof(bytes));
+ struct mtk_poll_state state = {};
if (MTK_HAS_CAPS(eth->soc->caps, MTK_QDMA))
- budget = mtk_poll_tx_qdma(eth, budget, done, bytes);
+ budget = mtk_poll_tx_qdma(eth, budget, &state);
else
- budget = mtk_poll_tx_pdma(eth, budget, done, bytes);
+ budget = mtk_poll_tx_pdma(eth, budget, &state);
- for (i = 0; i < MTK_MAC_COUNT; i++) {
- if (!eth->netdev[i] || !done[i])
- continue;
- netdev_completed_queue(eth->netdev[i], done[i], bytes[i]);
- total += done[i];
- eth->tx_packets += done[i];
- eth->tx_bytes += bytes[i];
- }
+ if (state.txq)
+ netdev_tx_completed_queue(state.txq, state.done, state.bytes);
dim_update_sample(eth->tx_events, eth->tx_packets, eth->tx_bytes,
&dim_sample);
@@ -2123,7 +2221,7 @@ static int mtk_poll_tx(struct mtk_eth *e
(atomic_read(&ring->free_count) > ring->thresh))
mtk_wake_queue(eth);
- return total;
+ return state.total;
}
static void mtk_handle_status_irq(struct mtk_eth *eth)
@@ -2209,6 +2307,7 @@ static int mtk_tx_alloc(struct mtk_eth *
int i, sz = soc->txrx.txd_size;
struct mtk_tx_dma_v2 *txd;
int ring_size;
+ u32 ofs, val;
if (MTK_HAS_CAPS(soc->caps, MTK_QDMA))
ring_size = MTK_QDMA_RING_SIZE;
@@ -2276,8 +2375,25 @@ static int mtk_tx_alloc(struct mtk_eth *
ring->phys + ((ring_size - 1) * sz),
soc->reg_map->qdma.crx_ptr);
mtk_w32(eth, ring->last_free_ptr, soc->reg_map->qdma.drx_ptr);
- mtk_w32(eth, (QDMA_RES_THRES << 8) | QDMA_RES_THRES,
- soc->reg_map->qdma.qtx_cfg);
+
+ for (i = 0, ofs = 0; i < MTK_QDMA_NUM_QUEUES; i++) {
+ val = (QDMA_RES_THRES << 8) | QDMA_RES_THRES;
+ mtk_w32(eth, val, soc->reg_map->qdma.qtx_cfg + ofs);
+
+ val = MTK_QTX_SCH_MIN_RATE_EN |
+ /* minimum: 10 Mbps */
+ FIELD_PREP(MTK_QTX_SCH_MIN_RATE_MAN, 1) |
+ FIELD_PREP(MTK_QTX_SCH_MIN_RATE_EXP, 4) |
+ MTK_QTX_SCH_LEAKY_BUCKET_SIZE;
+ if (!MTK_HAS_CAPS(eth->soc->caps, MTK_NETSYS_V2))
+ val |= MTK_QTX_SCH_LEAKY_BUCKET_EN;
+ mtk_w32(eth, val, soc->reg_map->qdma.qtx_sch + ofs);
+ ofs += MTK_QTX_OFFSET;
+ }
+ val = MTK_QDMA_TX_SCH_MAX_WFQ | (MTK_QDMA_TX_SCH_MAX_WFQ << 16);
+ mtk_w32(eth, val, soc->reg_map->qdma.tx_sch_rate);
+ if (MTK_HAS_CAPS(eth->soc->caps, MTK_NETSYS_V2))
+ mtk_w32(eth, val, soc->reg_map->qdma.tx_sch_rate + 4);
} else {
mtk_w32(eth, ring->phys_pdma, MT7628_TX_BASE_PTR0);
mtk_w32(eth, ring_size, MT7628_TX_MAX_CNT0);
@@ -2962,7 +3078,7 @@ static int mtk_start_dma(struct mtk_eth
if (MTK_HAS_CAPS(eth->soc->caps, MTK_NETSYS_V2))
val |= MTK_MUTLI_CNT | MTK_RESV_BUF |
MTK_WCOMP_EN | MTK_DMAD_WR_WDONE |
- MTK_CHK_DDONE_EN;
+ MTK_CHK_DDONE_EN | MTK_LEAKY_BUCKET_EN;
else
val |= MTK_RX_BT_32DWORDS;
mtk_w32(eth, val, reg_map->qdma.glo_cfg);
@@ -3008,6 +3124,45 @@ static void mtk_gdm_config(struct mtk_et
mtk_w32(eth, 0, MTK_RST_GL);
}
+static int mtk_device_event(struct notifier_block *n, unsigned long event, void *ptr)
+{
+ struct mtk_mac *mac = container_of(n, struct mtk_mac, device_notifier);
+ struct mtk_eth *eth = mac->hw;
+ struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+ struct ethtool_link_ksettings s;
+ struct net_device *ldev;
+ struct list_head *iter;
+ struct dsa_port *dp;
+
+ if (event != NETDEV_CHANGE)
+ return NOTIFY_DONE;
+
+ netdev_for_each_lower_dev(dev, ldev, iter) {
+ if (netdev_priv(ldev) == mac)
+ goto found;
+ }
+
+ return NOTIFY_DONE;
+
+found:
+ if (!dsa_slave_dev_check(dev))
+ return NOTIFY_DONE;
+
+ if (__ethtool_get_link_ksettings(dev, &s))
+ return NOTIFY_DONE;
+
+ if (s.base.speed == 0 || s.base.speed == ((__u32)-1))
+ return NOTIFY_DONE;
+
+ dp = dsa_port_from_netdev(dev);
+ if (dp->index >= MTK_QDMA_NUM_QUEUES)
+ return NOTIFY_DONE;
+
+ mtk_set_queue_speed(eth, dp->index + 3, s.base.speed);
+
+ return NOTIFY_DONE;
+}
+
static int mtk_open(struct net_device *dev)
{
struct mtk_mac *mac = netdev_priv(dev);
@@ -3050,7 +3205,8 @@ static int mtk_open(struct net_device *d
refcount_inc(&eth->dma_refcnt);
phylink_start(mac->phylink);
- netif_start_queue(dev);
+ netif_tx_start_all_queues(dev);
+
return 0;
}
@@ -3759,8 +3915,12 @@ static int mtk_unreg_dev(struct mtk_eth
int i;
for (i = 0; i < MTK_MAC_COUNT; i++) {
+ struct mtk_mac *mac;
if (!eth->netdev[i])
continue;
+ mac = netdev_priv(eth->netdev[i]);
+ if (MTK_HAS_CAPS(eth->soc->caps, MTK_QDMA))
+ unregister_netdevice_notifier(&mac->device_notifier);
unregister_netdev(eth->netdev[i]);
}
@@ -3977,6 +4137,23 @@ static int mtk_set_rxnfc(struct net_devi
return ret;
}
+static u16 mtk_select_queue(struct net_device *dev, struct sk_buff *skb,
+ struct net_device *sb_dev)
+{
+ struct mtk_mac *mac = netdev_priv(dev);
+ unsigned int queue = 0;
+
+ if (netdev_uses_dsa(dev))
+ queue = skb_get_queue_mapping(skb) + 3;
+ else
+ queue = mac->id;
+
+ if (queue >= dev->num_tx_queues)
+ queue = 0;
+
+ return queue;
+}
+
static const struct ethtool_ops mtk_ethtool_ops = {
.get_link_ksettings = mtk_get_link_ksettings,
.set_link_ksettings = mtk_set_link_ksettings,
@@ -4011,6 +4188,7 @@ static const struct net_device_ops mtk_n
.ndo_setup_tc = mtk_eth_setup_tc,
.ndo_bpf = mtk_xdp,
.ndo_xdp_xmit = mtk_xdp_xmit,
+ .ndo_select_queue = mtk_select_queue,
};
static int mtk_add_mac(struct mtk_eth *eth, struct device_node *np)
@@ -4020,6 +4198,7 @@ static int mtk_add_mac(struct mtk_eth *e
struct phylink *phylink;
struct mtk_mac *mac;
int id, err;
+ int txqs = 1;
if (!_id) {
dev_err(eth->dev, "missing mac id\n");
@@ -4037,7 +4216,10 @@ static int mtk_add_mac(struct mtk_eth *e
return -EINVAL;
}
- eth->netdev[id] = alloc_etherdev(sizeof(*mac));
+ if (MTK_HAS_CAPS(eth->soc->caps, MTK_QDMA))
+ txqs = MTK_QDMA_NUM_QUEUES;
+
+ eth->netdev[id] = alloc_etherdev_mqs(sizeof(*mac), txqs, 1);
if (!eth->netdev[id]) {
dev_err(eth->dev, "alloc_etherdev failed\n");
return -ENOMEM;
@@ -4145,6 +4327,11 @@ static int mtk_add_mac(struct mtk_eth *e
else
eth->netdev[id]->max_mtu = MTK_MAX_RX_LENGTH_2K - MTK_RX_ETH_HLEN;
+ if (MTK_HAS_CAPS(eth->soc->caps, MTK_QDMA)) {
+ mac->device_notifier.notifier_call = mtk_device_event;
+ register_netdevice_notifier(&mac->device_notifier);
+ }
+
return 0;
free_netdev:
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.h
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
@@ -22,6 +22,7 @@
#include <linux/bpf_trace.h>
#include "mtk_ppe.h"
+#define MTK_QDMA_NUM_QUEUES 16
#define MTK_QDMA_PAGE_SIZE 2048
#define MTK_MAX_RX_LENGTH 1536
#define MTK_MAX_RX_LENGTH_2K 2048
@@ -216,8 +217,26 @@
#define MTK_RING_MAX_AGG_CNT_H ((MTK_HW_LRO_MAX_AGG_CNT >> 6) & 0x3)
/* QDMA TX Queue Configuration Registers */
+#define MTK_QTX_OFFSET 0x10
#define QDMA_RES_THRES 4
+/* QDMA Tx Queue Scheduler Configuration Registers */
+#define MTK_QTX_SCH_TX_SEL BIT(31)
+#define MTK_QTX_SCH_TX_SEL_V2 GENMASK(31, 30)
+
+#define MTK_QTX_SCH_LEAKY_BUCKET_EN BIT(30)
+#define MTK_QTX_SCH_LEAKY_BUCKET_SIZE GENMASK(29, 28)
+#define MTK_QTX_SCH_MIN_RATE_EN BIT(27)
+#define MTK_QTX_SCH_MIN_RATE_MAN GENMASK(26, 20)
+#define MTK_QTX_SCH_MIN_RATE_EXP GENMASK(19, 16)
+#define MTK_QTX_SCH_MAX_RATE_WEIGHT GENMASK(15, 12)
+#define MTK_QTX_SCH_MAX_RATE_EN BIT(11)
+#define MTK_QTX_SCH_MAX_RATE_MAN GENMASK(10, 4)
+#define MTK_QTX_SCH_MAX_RATE_EXP GENMASK(3, 0)
+
+/* QDMA TX Scheduler Rate Control Register */
+#define MTK_QDMA_TX_SCH_MAX_WFQ BIT(15)
+
/* QDMA Global Configuration Register */
#define MTK_RX_2B_OFFSET BIT(31)
#define MTK_RX_BT_32DWORDS (3 << 11)
@@ -236,6 +255,7 @@
#define MTK_WCOMP_EN BIT(24)
#define MTK_RESV_BUF (0x40 << 16)
#define MTK_MUTLI_CNT (0x4 << 12)
+#define MTK_LEAKY_BUCKET_EN BIT(11)
/* QDMA Flow Control Register */
#define FC_THRES_DROP_MODE BIT(20)
@@ -266,8 +286,6 @@
#define MTK_STAT_OFFSET 0x40
/* QDMA TX NUM */
-#define MTK_QDMA_TX_NUM 16
-#define MTK_QDMA_TX_MASK (MTK_QDMA_TX_NUM - 1)
#define QID_BITS_V2(x) (((x) & 0x3f) << 16)
#define MTK_QDMA_GMAC2_QID 8
@@ -297,6 +315,7 @@
#define TX_DMA_PLEN0(x) (((x) & eth->soc->txrx.dma_max_len) << eth->soc->txrx.dma_len_offset)
#define TX_DMA_PLEN1(x) ((x) & eth->soc->txrx.dma_max_len)
#define TX_DMA_SWC BIT(14)
+#define TX_DMA_PQID GENMASK(3, 0)
/* PDMA on MT7628 */
#define TX_DMA_DONE BIT(31)
@@ -957,6 +976,7 @@ struct mtk_reg_map {
} pdma;
struct {
u32 qtx_cfg; /* tx queue configuration */
+ u32 qtx_sch; /* tx queue scheduler configuration */
u32 rx_ptr; /* rx base pointer */
u32 rx_cnt_cfg; /* rx max count configuration */
u32 qcrx_ptr; /* rx cpu pointer */
@@ -974,6 +994,7 @@ struct mtk_reg_map {
u32 fq_tail; /* fq tail pointer */
u32 fq_count; /* fq free page count */
u32 fq_blen; /* fq free page buffer length */
+ u32 tx_sch_rate; /* tx scheduler rate control registers */
} qdma;
u32 gdm1_cnt;
u32 gdma_to_ppe;
@@ -1177,6 +1198,7 @@ struct mtk_mac {
__be32 hwlro_ip[MTK_MAX_LRO_IP_CNT];
int hwlro_ip_cnt;
unsigned int syscfg0;
+ struct notifier_block device_notifier;
};
/* the struct describing the SoC. these are declared in the soc_xyz.c files */

View File

@ -1,20 +0,0 @@
From: Felix Fietkau <nbd@nbd.name>
Date: Fri, 28 Oct 2022 18:16:03 +0200
Subject: [PATCH] net: dsa: tag_mtk: assign per-port queues
Keeps traffic sent to the switch within link speed limits
Signed-off-by: Felix Fietkau <nbd@nbd.name>
---
--- a/net/dsa/tag_mtk.c
+++ b/net/dsa/tag_mtk.c
@@ -25,6 +25,8 @@ static struct sk_buff *mtk_tag_xmit(stru
u8 xmit_tpid;
u8 *mtk_tag;
+ skb_set_queue_mapping(skb, dp->index);
+
/* Build the special tag after the MAC Source Address. If VLAN header
* is present, it's required that VLAN header and special tag is
* being combined. Only in this way we can allow the switch can parse

View File

@ -1,93 +0,0 @@
From: Felix Fietkau <nbd@nbd.name>
Date: Thu, 3 Nov 2022 17:49:44 +0100
Subject: [PATCH] net: ethernet: mediatek: ppe: assign per-port queues
for offloaded traffic
Keeps traffic sent to the switch within link speed limits
Signed-off-by: Felix Fietkau <nbd@nbd.name>
---
--- a/drivers/net/ethernet/mediatek/mtk_ppe.c
+++ b/drivers/net/ethernet/mediatek/mtk_ppe.c
@@ -399,6 +399,24 @@ int mtk_foe_entry_set_wdma(struct mtk_et
return 0;
}
+int mtk_foe_entry_set_queue(struct mtk_eth *eth, struct mtk_foe_entry *entry,
+ unsigned int queue)
+{
+ u32 *ib2 = mtk_foe_entry_ib2(eth, entry);
+
+ if (MTK_HAS_CAPS(eth->soc->caps, MTK_NETSYS_V2)) {
+ *ib2 &= ~MTK_FOE_IB2_QID_V2;
+ *ib2 |= FIELD_PREP(MTK_FOE_IB2_QID_V2, queue);
+ *ib2 |= MTK_FOE_IB2_PSE_QOS_V2;
+ } else {
+ *ib2 &= ~MTK_FOE_IB2_QID;
+ *ib2 |= FIELD_PREP(MTK_FOE_IB2_QID, queue);
+ *ib2 |= MTK_FOE_IB2_PSE_QOS;
+ }
+
+ return 0;
+}
+
static bool
mtk_flow_entry_match(struct mtk_eth *eth, struct mtk_flow_entry *entry,
struct mtk_foe_entry *data)
--- a/drivers/net/ethernet/mediatek/mtk_ppe.h
+++ b/drivers/net/ethernet/mediatek/mtk_ppe.h
@@ -68,7 +68,9 @@ enum {
#define MTK_FOE_IB2_DSCP GENMASK(31, 24)
/* CONFIG_MEDIATEK_NETSYS_V2 */
+#define MTK_FOE_IB2_QID_V2 GENMASK(6, 0)
#define MTK_FOE_IB2_PORT_MG_V2 BIT(7)
+#define MTK_FOE_IB2_PSE_QOS_V2 BIT(8)
#define MTK_FOE_IB2_DEST_PORT_V2 GENMASK(12, 9)
#define MTK_FOE_IB2_MULTICAST_V2 BIT(13)
#define MTK_FOE_IB2_WDMA_WINFO_V2 BIT(19)
@@ -351,6 +353,8 @@ int mtk_foe_entry_set_pppoe(struct mtk_e
int sid);
int mtk_foe_entry_set_wdma(struct mtk_eth *eth, struct mtk_foe_entry *entry,
int wdma_idx, int txq, int bss, int wcid);
+int mtk_foe_entry_set_queue(struct mtk_eth *eth, struct mtk_foe_entry *entry,
+ unsigned int queue);
int mtk_foe_entry_commit(struct mtk_ppe *ppe, struct mtk_flow_entry *entry);
void mtk_foe_entry_clear(struct mtk_ppe *ppe, struct mtk_flow_entry *entry);
int mtk_foe_entry_idle_time(struct mtk_ppe *ppe, struct mtk_flow_entry *entry);
--- a/drivers/net/ethernet/mediatek/mtk_ppe_offload.c
+++ b/drivers/net/ethernet/mediatek/mtk_ppe_offload.c
@@ -188,7 +188,7 @@ mtk_flow_set_output_device(struct mtk_et
int *wed_index)
{
struct mtk_wdma_info info = {};
- int pse_port, dsa_port;
+ int pse_port, dsa_port, queue;
if (mtk_flow_get_wdma_info(dev, dest_mac, &info) == 0) {
mtk_foe_entry_set_wdma(eth, foe, info.wdma_idx, info.queue,
@@ -212,8 +212,6 @@ mtk_flow_set_output_device(struct mtk_et
}
dsa_port = mtk_flow_get_dsa_port(&dev);
- if (dsa_port >= 0)
- mtk_foe_entry_set_dsa(eth, foe, dsa_port);
if (dev == eth->netdev[0])
pse_port = 1;
@@ -222,6 +220,14 @@ mtk_flow_set_output_device(struct mtk_et
else
return -EOPNOTSUPP;
+ if (dsa_port >= 0) {
+ mtk_foe_entry_set_dsa(eth, foe, dsa_port);
+ queue = 3 + dsa_port;
+ } else {
+ queue = pse_port - 1;
+ }
+ mtk_foe_entry_set_queue(eth, foe, queue);
+
out:
mtk_foe_entry_set_pse_port(eth, foe, pse_port);

View File

@ -1,72 +0,0 @@
From: Felix Fietkau <nbd@nbd.name>
Date: Tue, 8 Nov 2022 15:03:15 +0100
Subject: [PATCH] net: dsa: add support for DSA rx offloading via
metadata dst
If a metadata dst is present with the type METADATA_HW_PORT_MUX on a dsa cpu
port netdev, assume that it carries the port number and that there is no DSA
tag present in the skb data.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
---
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -971,12 +971,14 @@ bool __skb_flow_dissect(const struct net
#if IS_ENABLED(CONFIG_NET_DSA)
if (unlikely(skb->dev && netdev_uses_dsa(skb->dev) &&
proto == htons(ETH_P_XDSA))) {
+ struct metadata_dst *md_dst = skb_metadata_dst(skb);
const struct dsa_device_ops *ops;
int offset = 0;
ops = skb->dev->dsa_ptr->tag_ops;
/* Only DSA header taggers break flow dissection */
- if (ops->needed_headroom) {
+ if (ops->needed_headroom &&
+ (!md_dst || md_dst->type != METADATA_HW_PORT_MUX)) {
if (ops->flow_dissect)
ops->flow_dissect(skb, &proto, &offset);
else
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -11,6 +11,7 @@
#include <linux/netdevice.h>
#include <linux/sysfs.h>
#include <linux/ptp_classify.h>
+#include <net/dst_metadata.h>
#include "dsa_priv.h"
@@ -216,6 +217,7 @@ static bool dsa_skb_defer_rx_timestamp(s
static int dsa_switch_rcv(struct sk_buff *skb, struct net_device *dev,
struct packet_type *pt, struct net_device *unused)
{
+ struct metadata_dst *md_dst = skb_metadata_dst(skb);
struct dsa_port *cpu_dp = dev->dsa_ptr;
struct sk_buff *nskb = NULL;
struct dsa_slave_priv *p;
@@ -229,7 +231,22 @@ static int dsa_switch_rcv(struct sk_buff
if (!skb)
return 0;
- nskb = cpu_dp->rcv(skb, dev);
+ if (md_dst && md_dst->type == METADATA_HW_PORT_MUX) {
+ unsigned int port = md_dst->u.port_info.port_id;
+
+ skb_dst_drop(skb);
+ if (!skb_has_extensions(skb))
+ skb->slow_gro = 0;
+
+ skb->dev = dsa_master_find_slave(dev, 0, port);
+ if (likely(skb->dev)) {
+ dsa_default_offload_fwd_mark(skb);
+ nskb = skb;
+ }
+ } else {
+ nskb = cpu_dp->rcv(skb, dev);
+ }
+
if (!nskb) {
kfree_skb(skb);
return 0;

View File

@ -1,192 +0,0 @@
From: Felix Fietkau <nbd@nbd.name>
Date: Fri, 28 Oct 2022 11:01:12 +0200
Subject: [PATCH] net: ethernet: mtk_eth_soc: fix VLAN rx hardware
acceleration
- enable VLAN untagging for PDMA rx
- make it possible to disable the feature via ethtool
- pass VLAN tag to the DSA driver
- untag special tag on PDMA only if no non-DSA devices are in use
- disable special tag untagging on 7986 for now, since it's not working yet
Signed-off-by: Felix Fietkau <nbd@nbd.name>
---
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -23,6 +23,7 @@
#include <linux/jhash.h>
#include <linux/bitfield.h>
#include <net/dsa.h>
+#include <net/dst_metadata.h>
#include "mtk_eth_soc.h"
#include "mtk_wed.h"
@@ -2021,16 +2022,22 @@ static int mtk_poll_rx(struct napi_struc
htons(RX_DMA_VPID(trxd.rxd4)),
RX_DMA_VID(trxd.rxd4));
} else if (trxd.rxd2 & RX_DMA_VTAG) {
- __vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q),
+ __vlan_hwaccel_put_tag(skb, htons(RX_DMA_VPID(trxd.rxd3)),
RX_DMA_VID(trxd.rxd3));
}
+ }
+
+ /* When using VLAN untagging in combination with DSA, the
+ * hardware treats the MTK special tag as a VLAN and untags it.
+ */
+ if (skb_vlan_tag_present(skb) && netdev_uses_dsa(netdev)) {
+ unsigned int port = ntohs(skb->vlan_proto) & GENMASK(2, 0);
- /* If the device is attached to a dsa switch, the special
- * tag inserted in VLAN field by hw switch can * be offloaded
- * by RX HW VLAN offload. Clear vlan info.
- */
- if (netdev_uses_dsa(netdev))
- __vlan_hwaccel_clear_tag(skb);
+ if (port < ARRAY_SIZE(eth->dsa_meta) &&
+ eth->dsa_meta[port])
+ skb_dst_set_noref(skb, &eth->dsa_meta[port]->dst);
+
+ __vlan_hwaccel_clear_tag(skb);
}
skb_record_rx_queue(skb, 0);
@@ -2858,15 +2865,30 @@ static netdev_features_t mtk_fix_feature
static int mtk_set_features(struct net_device *dev, netdev_features_t features)
{
- int err = 0;
+ struct mtk_mac *mac = netdev_priv(dev);
+ struct mtk_eth *eth = mac->hw;
+ netdev_features_t diff = dev->features ^ features;
+ int i;
+
+ if ((diff & NETIF_F_LRO) && !(features & NETIF_F_LRO))
+ mtk_hwlro_netdev_disable(dev);
- if (!((dev->features ^ features) & NETIF_F_LRO))
+ /* Set RX VLAN offloading */
+ if (!(diff & NETIF_F_HW_VLAN_CTAG_RX))
return 0;
- if (!(features & NETIF_F_LRO))
- mtk_hwlro_netdev_disable(dev);
+ mtk_w32(eth, !!(features & NETIF_F_HW_VLAN_CTAG_RX),
+ MTK_CDMP_EG_CTRL);
- return err;
+ /* sync features with other MAC */
+ for (i = 0; i < MTK_MAC_COUNT; i++) {
+ if (!eth->netdev[i] || eth->netdev[i] == dev)
+ continue;
+ eth->netdev[i]->features &= ~NETIF_F_HW_VLAN_CTAG_RX;
+ eth->netdev[i]->features |= features & NETIF_F_HW_VLAN_CTAG_RX;
+ }
+
+ return 0;
}
/* wait for DMA to finish whatever it is doing before we start using it again */
@@ -3163,11 +3185,45 @@ found:
return NOTIFY_DONE;
}
+static bool mtk_uses_dsa(struct net_device *dev)
+{
+#if IS_ENABLED(CONFIG_NET_DSA)
+ return netdev_uses_dsa(dev) &&
+ dev->dsa_ptr->tag_ops->proto == DSA_TAG_PROTO_MTK;
+#else
+ return false;
+#endif
+}
+
static int mtk_open(struct net_device *dev)
{
struct mtk_mac *mac = netdev_priv(dev);
struct mtk_eth *eth = mac->hw;
- int err;
+ int i, err;
+
+ if (mtk_uses_dsa(dev) && !eth->prog) {
+ for (i = 0; i < ARRAY_SIZE(eth->dsa_meta); i++) {
+ struct metadata_dst *md_dst = eth->dsa_meta[i];
+
+ if (md_dst)
+ continue;
+
+ md_dst = metadata_dst_alloc(0, METADATA_HW_PORT_MUX,
+ GFP_KERNEL);
+ if (!md_dst)
+ return -ENOMEM;
+
+ md_dst->u.port_info.port_id = i;
+ eth->dsa_meta[i] = md_dst;
+ }
+ } else {
+ /* Hardware special tag parsing needs to be disabled if at least
+ * one MAC does not use DSA.
+ */
+ u32 val = mtk_r32(eth, MTK_CDMP_IG_CTRL);
+ val &= ~MTK_CDMP_STAG_EN;
+ mtk_w32(eth, val, MTK_CDMP_IG_CTRL);
+ }
err = phylink_of_phy_connect(mac->phylink, mac->of_node, 0);
if (err) {
@@ -3688,6 +3744,10 @@ static int mtk_hw_init(struct mtk_eth *e
*/
val = mtk_r32(eth, MTK_CDMQ_IG_CTRL);
mtk_w32(eth, val | MTK_CDMQ_STAG_EN, MTK_CDMQ_IG_CTRL);
+ if (!MTK_HAS_CAPS(eth->soc->caps, MTK_NETSYS_V2)) {
+ val = mtk_r32(eth, MTK_CDMP_IG_CTRL);
+ mtk_w32(eth, val | MTK_CDMP_STAG_EN, MTK_CDMP_IG_CTRL);
+ }
/* Enable RX VLan Offloading */
mtk_w32(eth, 1, MTK_CDMP_EG_CTRL);
@@ -3907,6 +3967,12 @@ static int mtk_free_dev(struct mtk_eth *
free_netdev(eth->netdev[i]);
}
+ for (i = 0; i < ARRAY_SIZE(eth->dsa_meta); i++) {
+ if (!eth->dsa_meta[i])
+ break;
+ metadata_dst_free(eth->dsa_meta[i]);
+ }
+
return 0;
}
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.h
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
@@ -22,6 +22,9 @@
#include <linux/bpf_trace.h>
#include "mtk_ppe.h"
+#define MTK_MAX_DSA_PORTS 7
+#define MTK_DSA_PORT_MASK GENMASK(2, 0)
+
#define MTK_QDMA_NUM_QUEUES 16
#define MTK_QDMA_PAGE_SIZE 2048
#define MTK_MAX_RX_LENGTH 1536
@@ -105,6 +108,9 @@
#define MTK_CDMQ_IG_CTRL 0x1400
#define MTK_CDMQ_STAG_EN BIT(0)
+/* CDMQ Exgress Control Register */
+#define MTK_CDMQ_EG_CTRL 0x1404
+
/* CDMP Ingress Control Register */
#define MTK_CDMP_IG_CTRL 0x400
#define MTK_CDMP_STAG_EN BIT(0)
@@ -1164,6 +1170,8 @@ struct mtk_eth {
int ip_align;
+ struct metadata_dst *dsa_meta[MTK_MAX_DSA_PORTS];
+
struct mtk_ppe *ppe[2];
struct rhashtable flow_table;

View File

@ -1,42 +0,0 @@
From: =?UTF-8?q?Ar=C4=B1n=C3=A7=20=C3=9CNAL?= <arinc.unal@arinc9.com>
Date: Sat, 28 Jan 2023 12:42:32 +0300
Subject: [PATCH] net: ethernet: mtk_eth_soc: disable hardware DSA untagging
for second MAC
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
According to my tests on MT7621AT and MT7623NI SoCs, hardware DSA untagging
won't work on the second MAC. Therefore, disable this feature when the
second MAC of the MT7621 and MT7623 SoCs is being used.
Fixes: 2d7605a72906 ("net: ethernet: mtk_eth_soc: enable hardware DSA untagging")
Link: https://lore.kernel.org/netdev/6249fc14-b38a-c770-36b4-5af6d41c21d3@arinc9.com/
Tested-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Link: https://lore.kernel.org/r/20230128094232.2451947-1-arinc.unal@arinc9.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -3201,7 +3201,8 @@ static int mtk_open(struct net_device *d
struct mtk_eth *eth = mac->hw;
int i, err;
- if (mtk_uses_dsa(dev) && !eth->prog) {
+ if ((mtk_uses_dsa(dev) && !eth->prog) &&
+ !(mac->id == 1 && MTK_HAS_CAPS(eth->soc->caps, MTK_GMAC1_TRGMII))) {
for (i = 0; i < ARRAY_SIZE(eth->dsa_meta); i++) {
struct metadata_dst *md_dst = eth->dsa_meta[i];
@@ -3218,7 +3219,8 @@ static int mtk_open(struct net_device *d
}
} else {
/* Hardware special tag parsing needs to be disabled if at least
- * one MAC does not use DSA.
+ * one MAC does not use DSA, or the second MAC of the MT7621 and
+ * MT7623 SoCs is being used.
*/
u32 val = mtk_r32(eth, MTK_CDMP_IG_CTRL);
val &= ~MTK_CDMP_STAG_EN;

View File

@ -1,54 +0,0 @@
From: =?UTF-8?q?Ar=C4=B1n=C3=A7=20=C3=9CNAL?= <arinc.unal@arinc9.com>
Date: Sun, 5 Feb 2023 20:53:31 +0300
Subject: [PATCH] net: ethernet: mtk_eth_soc: enable special tag when any MAC
uses DSA
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The special tag is only enabled when the first MAC uses DSA. However, it
must be enabled when any MAC uses DSA. Change the check accordingly.
This fixes hardware DSA untagging not working on the second MAC of the
MT7621 and MT7623 SoCs, and likely other SoCs too. Therefore, remove the
check that disables hardware DSA untagging for the second MAC of the MT7621
and MT7623 SoCs.
Fixes: a1f47752fd62 ("net: ethernet: mtk_eth_soc: disable hardware DSA untagging for second MAC")
Co-developed-by: Richard van Schagen <richard@routerhints.com>
Signed-off-by: Richard van Schagen <richard@routerhints.com>
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -3136,7 +3136,7 @@ static void mtk_gdm_config(struct mtk_et
val |= config;
- if (!i && eth->netdev[0] && netdev_uses_dsa(eth->netdev[0]))
+ if (eth->netdev[i] && netdev_uses_dsa(eth->netdev[i]))
val |= MTK_GDMA_SPECIAL_TAG;
mtk_w32(eth, val, MTK_GDMA_FWD_CFG(i));
@@ -3201,8 +3201,7 @@ static int mtk_open(struct net_device *d
struct mtk_eth *eth = mac->hw;
int i, err;
- if ((mtk_uses_dsa(dev) && !eth->prog) &&
- !(mac->id == 1 && MTK_HAS_CAPS(eth->soc->caps, MTK_GMAC1_TRGMII))) {
+ if (mtk_uses_dsa(dev) && !eth->prog) {
for (i = 0; i < ARRAY_SIZE(eth->dsa_meta); i++) {
struct metadata_dst *md_dst = eth->dsa_meta[i];
@@ -3219,8 +3218,7 @@ static int mtk_open(struct net_device *d
}
} else {
/* Hardware special tag parsing needs to be disabled if at least
- * one MAC does not use DSA, or the second MAC of the MT7621 and
- * MT7623 SoCs is being used.
+ * one MAC does not use DSA.
*/
u32 val = mtk_r32(eth, MTK_CDMP_IG_CTRL);
val &= ~MTK_CDMP_STAG_EN;

Some files were not shown because too many files have changed in this diff Show More